Batch ETL Integration Guide¶
Learn how to process large volumes of AL3 files using IOSetu in batch mode.
Use Case¶
You have thousands of historical AL3 files (e.g., daily dumps from an agency management system) that need to be converted to a modern format for warehousing.
Strategy 1: Sequential Scripting (Simple)¶
For moderate volumes (< 10GB), a simple shell script or Python script running locally or on a single VM is sufficient. IOSetu processes typical AL3 files in milliseconds.
#!/bin/bash
# Bulk process all .AL3 files in a directory
INPUT_DIR="./raw_al3"
OUTPUT_DIR="./processed_json"
IOSETU_URL="http://localhost:8080/v1/parse"
mkdir -p "$OUTPUT_DIR"
for file in "$INPUT_DIR"/*.AL3; do
filename=$(basename "$file")
echo "Processing $filename..."
curl -s -X POST \
--data-binary "@$file" \
"$IOSETU_URL" > "$OUTPUT_DIR/${filename}.json"
# Check parse success by inspecting file or exit code
if [ $? -eq 0 ]; then
echo "✓ Success"
else
echo "✗ Failed"
fi
done
Strategy 2: Parallel Processing (High Volume)¶
For high volumes, leverage parallel workers. IOSetu handles concurrent requests efficiently.
Using GNU Parallel¶
find ./raw_al3 -name "*.AL3" | parallel -j 8 \
"curl -s --data-binary @{} http://localhost:8080/v1/parse > ./processed/{/.}.json"
-j 8: Runs 8 concurrent uploads.
Using Python Multiprocessing¶
from multiprocessing import Pool
import requests
import os
files = [f for f in os.listdir('./raw') if f.endswith('.AL3')]
def process_file(filename):
with open(f"./raw/{filename}", 'rb') as f:
resp = requests.post('http://localhost:8080/v1/parse', data=f.read())
if resp.ok:
with open(f"./processed/{filename}.json", 'w') as out:
out.write(resp.text)
return True
return False
if __name__ == '__main__':
with Pool(processes=16) as pool:
results = pool.map(process_file, files)
print(f"Processed {sum(results)} files.")
Strategy 3: Pipeline Orchestration (Airflow/Prefect)¶
Integrate IOSetu as a task in your ETL DAG.
Airflow Example: 1. Sensor: Watch S3 bucket for new zip archives. 2. Unzip: Extract AL3 files to a temporary location. 3. Map Task: Send each AL3 file to IOSetu (HTTP Operator). 4. Load: Upload resulting JSONs to Snowflake/Redshift.
Recommendations¶
- Resource Sizing: IOSetu is CPU-bound. If running locally, allocate 1 CPU core per concurrent worker.
- Memory: Memory usage is low per request, but JSON parsing response can be large. Use NDJSON output (
Accept: application/x-ndjson) to minimize client-side memory overhead during ETL. - Aggregation: If loading into a Data Warehouse, consider concatenating multiple small JSON outputs into a single large NDJSON/JSONL file before
COPYcommand for better DB import performance.