Overview

A connectomics dataset is not one thing — it is a family of representations at different levels of abstraction. Raw images, segmentation volumes, surface meshes, morphological skeletons, and connectivity graphs each capture different aspects of the same underlying biology. Choosing the right representation for a given task is a core technical skill, because each format has characteristic strengths, blind spots, and computational costs.


The representation hierarchy

Raw EM images (voxels)
    ↓ segmentation
Labeled volumes (voxel → segment ID)
    ↓ surface extraction
Meshes (triangulated surfaces)
    ↓ skeletonization
Skeletons (tree graphs with spatial coordinates)
    ↓ synapse assignment
Connectome graph (neurons as nodes, synapses as edges)

Each arrow is an information-reducing transformation. You gain computational efficiency and analytical clarity, but you lose spatial detail. The key question is: what information do you need for your analysis, and what is the cheapest representation that preserves it?


Volumetric data

What it is

The most fundamental representation: a 3D array of voxel intensities (raw images) or voxel labels (segmentation). Every spatial position has a value.

Formats

Format Description Typical use
Neuroglancer precomputed Chunked, multiscale image pyramid served over HTTP Web-based browsing (Neuroglancer, Spelunker)
N5 Chunked, compressed, hierarchical format (Java/Python) Pipeline intermediate storage
Zarr Python-native chunked array format, cloud-friendly Analysis, cloud storage (S3, GCS)
HDF5 Hierarchical Data Format, self-describing Legacy, local analysis
TIFF stacks Uncompressed or LZW-compressed image stacks Raw microscope output, small datasets

Key properties

When to use volumetric data

Limitations


Surface meshes

What they are

Triangulated surfaces that represent the boundary of each segmented object. Each mesh is a set of vertices (3D points) and faces (triangles connecting vertices).

How they’re generated

Marching cubes algorithm (or variants) applied to the segmentation volume. For each segment, extract the isosurface at the boundary between that segment and its neighbors. The result is a watertight mesh (ideally).

Formats

Format Description
OBJ Simple text format, widely supported
PLY Binary or text, supports vertex attributes (colors)
STL Binary triangle format, common in 3D printing
Neuroglancer mesh Chunked, multi-resolution mesh format for web rendering
DRACO Google’s compressed mesh format, used in Neuroglancer

Key properties

When to use meshes

Limitations


Skeletons

What they are

Tree-graph representations of neuron morphology. Each skeleton is a set of nodes (3D coordinates along the neurite centerline) connected by edges (parent-child relationships). The root is typically the soma, and branches represent dendrites and axons.

How they’re generated

Formats

Format Description
SWC Standard text format for neuron morphologies. Each line: ID, type, x, y, z, radius, parent_ID. Widely supported by morphology tools (NeuroM, Neurolucida, NEURON simulator).
JSON skeleton Used by Neuroglancer and CloudVolume
CATMAID skeleton Database-backed skeleton with annotations

Key properties

When to use skeletons

Limitations


Connectome graphs

What they are

The highest-level representation: neurons as nodes, synaptic connections as edges. This is the “connectome” — the wiring diagram.

How they’re constructed

  1. Each segmented neuron = one node
  2. Each detected synapse → identify pre-synaptic and post-synaptic segments → create directed edge from pre to post
  3. Aggregate: multiple synapses between the same pair → edge weight = synapse count (or sum of cleft areas)

Formats

Format Description
Edge list (CSV/TSV) Simple: pre_id, post_id, weight, synapse_count
Adjacency matrix (NumPy/sparse) N×N matrix, good for linear algebra
GraphML / GEXF XML-based, supports node/edge attributes
NetworkX pickle Python-native, good for analysis
Neo4j / graph database Queryable graph store for large connectomes

Node attributes

Edge attributes

When to use graphs

Limitations


Worked example: choosing a representation

Question: “Do inhibitory interneurons preferentially target the perisomatic region of pyramidal cells in layer 2/3?”

Analysis needs:

  1. Identify inhibitory and excitatory neurons → need cell-type labels (graph node attributes)
  2. Find synapses between inhibitory → pyramidal pairs → need connectome graph edges
  3. Determine synapse location on the pyramidal cell (perisomatic vs distal dendrite) → need synapse spatial coordinates mapped onto the pyramidal cell morphology

Representation choice: This question requires the connectome graph (for connectivity) plus skeletons (for distance-from-soma measurement at each synapse location). Neither the graph alone (no spatial synapse info) nor the volume alone (too expensive for the network-level query) would suffice.


References