serialize-data-formats
Acerca de
Esta habilidad permite la serialización y deserialización entre formatos como JSON, XML, YAML, Protobuf y MessagePack. Ayuda a los desarrolladores a elegir el formato adecuado según criterios como rendimiento, tamaño e interoperabilidad para APIs, almacenamiento o comunicación entre sistemas. Úsala cuando necesites optimizar la transferencia de datos, persistir datos estructurados o migrar entre estándares de serialización.
Instalación rápida
Claude Code
Recomendadonpx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/serialize-data-formatsCopia y pega este comando en Claude Code para instalar esta habilidad
Documentación
Serialize Data Formats
Pick + implement right data serialization format for use case. Correct encoding/decoding + performance awareness.
When Use
- Pick wire format for API comms
- Persist structured data to disk or object storage
- Exchange data between systems in different languages
- Optimize data transfer size or parse speed
- Migrate from one serialization format to another
Inputs
- Required: Data structure to serialize (schema or example)
- Required: Use case (API, storage, streaming, analytics)
- Optional: Performance needs (size, speed, schema enforcement)
- Optional: Target language/runtime constraints
- Optional: Human readability needs
Steps
Step 1: Select Right Format
| Format | Human Readable | Schema | Size | Speed | Best For |
|---|---|---|---|---|---|
| JSON | Yes | Optional (JSON Schema) | Medium | Medium | REST APIs, config, broad interop |
| XML | Yes | XSD, DTD | Large | Slow | Enterprise/legacy, SOAP, documents |
| YAML | Yes | Optional | Medium | Slow | Config files, CI/CD, Kubernetes |
| Protocol Buffers | No | Required (.proto) | Small | Fast | gRPC, microservices, mobile |
| MessagePack | No | None | Small | Fast | Real-time, embedded, Redis |
| Arrow/Parquet | No | Built-in | Very Small | Very Fast | Analytics, columnar queries, data lakes |
Decision tree.
- Need human editing? → YAML (config) or JSON (data)
- Need strict schema + fast RPC? → Protocol Buffers
- Need smallest wire size? → MessagePack or Protobuf
- Need columnar analytics? → Apache Parquet
- Need in-memory interchange? → Apache Arrow
- Legacy enterprise integration? → XML
Got: Format selected with documented rationale matching use case.
If fail: Requirements conflict (human-readable AND fast)? Prioritize primary use case, note trade-off.
Step 2: Implement JSON Serialization
import json
from datetime import datetime, date
from dataclasses import dataclass, asdict
@dataclass
class Measurement:
sensor_id: str
value: float
unit: str
timestamp: datetime
# Custom encoder for non-standard types
class CustomEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
if isinstance(obj, date):
return obj.isoformat()
if isinstance(obj, bytes):
import base64
return base64.b64encode(obj).decode('ascii')
return super().default(obj)
# Serialize
measurement = Measurement("sensor-01", 23.5, "celsius", datetime.now())
json_str = json.dumps(asdict(measurement), cls=CustomEncoder, indent=2)
# Deserialize
data = json.loads(json_str)
# R: JSON with jsonlite
library(jsonlite)
# Serialize
df <- data.frame(sensor_id = "sensor-01", value = 23.5, unit = "celsius")
json_str <- jsonlite::toJSON(df, auto_unbox = TRUE, pretty = TRUE)
# Deserialize
df_back <- jsonlite::fromJSON(json_str)
Got: Round-trip serialization preserves all data types accurate.
If fail: Type lost (e.g., dates become strings)? Add explicit type conversion in deserialization step.
Step 3: Implement Protocol Buffers
Define schema (.proto file).
syntax = "proto3";
package sensors;
message Measurement {
string sensor_id = 1;
double value = 2;
string unit = 3;
int64 timestamp_ms = 4; // Unix milliseconds
}
message MeasurementBatch {
repeated Measurement measurements = 1;
}
Generate + use.
# Generate Python code
protoc --python_out=. sensors.proto
# Generate Go code
protoc --go_out=. sensors.proto
from sensors_pb2 import Measurement, MeasurementBatch
import time
# Serialize
m = Measurement(
sensor_id="sensor-01",
value=23.5,
unit="celsius",
timestamp_ms=int(time.time() * 1000)
)
binary = m.SerializeToString() # Compact binary
# Deserialize
m2 = Measurement()
m2.ParseFromString(binary)
Got: Binary output 3-10x smaller than equivalent JSON.
If fail: protoc unavailable? Use language-native protobuf library (e.g., betterproto for Python).
Step 4: Implement MessagePack
import msgpack
from datetime import datetime
# Custom packing for datetime
def encode_datetime(obj):
if isinstance(obj, datetime):
return {"__datetime__": True, "s": obj.isoformat()}
return obj
def decode_datetime(obj):
if "__datetime__" in obj:
return datetime.fromisoformat(obj["s"])
return obj
data = {"sensor_id": "sensor-01", "value": 23.5, "ts": datetime.now()}
# Serialize (smaller than JSON, faster than JSON)
packed = msgpack.packb(data, default=encode_datetime)
# Deserialize
unpacked = msgpack.unpackb(packed, object_hook=decode_datetime, raw=False)
Got: MessagePack output 15-30% smaller than JSON for typical payloads.
If fail: Language lacks MessagePack support? Fall back to JSON with compression (gzip).
Step 5: Implement Apache Parquet (Columnar)
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
# Create data
df = pd.DataFrame({
"sensor_id": ["s-01", "s-02", "s-01", "s-03"] * 1000,
"value": [23.5, 18.2, 24.1, 19.8] * 1000,
"unit": ["celsius"] * 4000,
"timestamp": pd.date_range("2025-01-01", periods=4000, freq="min")
})
# Write Parquet (columnar, compressed)
table = pa.Table.from_pandas(df)
pq.write_table(table, "measurements.parquet", compression="snappy")
# Read Parquet (can read specific columns without loading all data)
table_back = pq.read_table("measurements.parquet", columns=["sensor_id", "value"])
df_subset = table_back.to_pandas()
# R: Parquet with arrow
library(arrow)
# Write
df <- data.frame(sensor_id = rep("s-01", 1000), value = rnorm(1000))
arrow::write_parquet(df, "measurements.parquet")
# Read (with column selection — only reads selected columns from disk)
df_back <- arrow::read_parquet("measurements.parquet", col_select = c("value"))
Got: Parquet files 5-20x smaller than CSV for typical tabular data.
If fail: Arrow unavailable? Use fastparquet (Python) or CSV with gzip as fallback.
Step 6: Compare Performance
Run benchmarks for your specific data + use case.
import json, msgpack, time
import pyarrow as pa, pyarrow.parquet as pq
data = [{"id": i, "value": i * 0.1, "label": f"item-{i}"} for i in range(10000)]
# JSON
start = time.perf_counter()
json_bytes = json.dumps(data).encode()
json_time = time.perf_counter() - start
# MessagePack
start = time.perf_counter()
msgpack_bytes = msgpack.packb(data)
msgpack_time = time.perf_counter() - start
print(f"JSON: {len(json_bytes):>8} bytes, {json_time*1000:.1f} ms")
print(f"MsgPack: {len(msgpack_bytes):>8} bytes, {msgpack_time*1000:.1f} ms")
Got: Benchmark results guide format selection for prod use.
If fail: Performance insufficient for any format? Consider compression (zstd, snappy) as orthogonal optimization.
Checks
- Selected format matches use case (documented rationale)
- Round-trip serialization preserves all data types
- Edge cases handled: empty collections, null/None values, Unicode, large numbers
- Performance benchmarked for representative payload sizes
- Error handling for malformed input (graceful failures, not crashes)
- Schema documented (JSON Schema, .proto, or equiv)
Pitfalls
- Floating-point precision: JSON represents all numbers as IEEE 754 doubles. Use string encoding for financial/decimal precision.
- Date/time handling: JSON has no native datetime type. Always document format (ISO 8601) + timezone handling.
- Schema evolution: Adding or removing fields can break consumers. Protobuf handles this well; JSON needs careful versioning.
- Binary data in JSON: Base64 encoding inflates binary data by ~33%. Use binary format for binary-heavy payloads.
- YAML security: YAML parsers may execute arbitrary code via
!!python/objecttags. Always use safe loaders.
See Also
design-serialization-schema— schema design, versioning, evolution strategiesimplement-pharma-serialisation— pharmaceutical serialisation (different domain, same naming)create-quarto-report— data output formatting for reports
Repositorio GitHub
Habilidades relacionadas
railway-docs
DocumentaciónEsta habilidad obtiene la documentación actual de Railway para responder preguntas sobre características, funcionalidad o URLs específicas de documentación. Garantiza que los desarrolladores reciban información precisa y actualizada directamente de las fuentes oficiales de Railway. Úsala cuando los usuarios pregunten cómo funciona Railway o hagan referencia a la documentación de Railway.
n8n-code-python
DocumentaciónEsta Skill de Claude proporciona orientación experta para escribir código Python en los nodos Code de n8n, específicamente para usar la biblioteca estándar de Python y trabajar con la sintaxis especial de n8n como `_input`, `_json` y `_node`. Ayuda a los desarrolladores a comprender las limitaciones de Python dentro de n8n y recomienda usar JavaScript para la mayoría de los flujos de trabajo, mientras ofrece soluciones en Python para necesidades específicas de transformación de datos.
archon
DocumentaciónLa habilidad Archon proporciona búsqueda semántica con tecnología RAG y gestión de proyectos a través de una API REST. Úsala para consultar documentación, gestionar proyectos/tareas jerárquicos y realizar recuperación de conocimiento con capacidades de carga de documentos. Prioriza siempre a Archon en primer lugar al buscar en documentación externa antes de utilizar otras fuentes.
n8n-code-javascript
DocumentaciónEsta habilidad de Claude proporciona orientación experta para escribir código JavaScript en los nodos de Código de n8n. Cubre sintaxis esencial específica de n8n como las variables `$input`/`$json`, ayudantes HTTP y manejo de DateTime, mientras soluciona errores comunes. Úsela al desarrollar flujos de trabajo en n8n que requieran procesamiento personalizado de JavaScript en los nodos de Código.
