Graph Representations

Overview

The connectome, at its core, is a graph: neurons are nodes, synaptic connections are edges. But the details of how you encode this graph — directed or undirected, binary or weighted, neuron-level or type-level — profoundly affect what analyses are possible and what conclusions you can draw. This document covers the representational choices that every connectomics analyst must make.

Instructor script: from EM volume to graph

The pipeline

The path from raw EM images to a queryable graph involves several lossy transformations:

Raw images → Segmentation: each voxel assigned to an object (neuron, glia, etc.)
Segmentation → Synapse detection: membrane appositions with vesicles + PSD identified as synapses
Synapse detection → Edge assignment: each synapse assigned a pre-synaptic neuron and post-synaptic neuron
Edge assignment → Graph construction: aggregate synapses into neuron-to-neuron edges

Each step can introduce errors. A segmentation merge error creates false edges. A missed synapse removes a true edge. A synapse with incorrect pre/post assignment creates a wrongly directed edge. The graph is only as reliable as the weakest link in this chain.

Teaching point: “When you analyze a connectome graph, you are analyzing the output of a computational pipeline, not ground truth. Every edge carries implicit uncertainty from segmentation and synapse detection.”

Nodes: what represents a neuron?

Neuron-level nodes

The most common representation: each reconstructed neuron is one node. Node attributes may include:

Attribute	Source	Example
Cell type	Morphological classification or molecular markers	“L2/3 pyramidal”, “PV+ basket”
Soma position	Centroid of soma segmentation	(x=2045.3, y=891.2, z=1567.8) μm
Laminar position	Depth from pia	Layer 2/3, 250 μm from pia
Morphological features	Computed from skeleton/mesh	Total cable length: 4,521 μm
Functional properties	From correlative calcium imaging (MICrONS)	Orientation selectivity: 45°
Reconstruction completeness	Fraction of arbor within volume	0.72 (72% of estimated total)

Compartment-level nodes

Sometimes it’s useful to split a neuron into compartments: soma, axon, individual dendritic branches. Each compartment becomes its own node. This enables questions like “which branch of neuron A receives input from neuron B?” but dramatically increases graph size.

Type-level nodes

For cross-region or cross-species comparisons, individual neurons are grouped by type, and the graph represents type-to-type connectivity. For example, in C. elegans analysis, the 302 individual neurons might be grouped into ~100 neuron classes. In Drosophila, ~139,000 neurons collapse to ~8,000 types.

Tradeoff: Type-level graphs lose individual variation but are more robust to segmentation errors and enable statistical comparisons.

Edges: what represents a connection?

Chemical synapses as directed edges

Each chemical synapse is naturally directed: the presynaptic terminal (with vesicles) releases neurotransmitter onto the postsynaptic element (with receptors/PSD). This creates a directed edge from the presynaptic neuron to the postsynaptic neuron.

In graph notation: an edge (A → B) means “neuron A makes at least one chemical synapse onto neuron B.”

Gap junctions as undirected edges

Electrical synapses (gap junctions) allow bidirectional current flow. These are represented as undirected edges (A — B). Gap junctions are less common than chemical synapses in mammalian cortex but are prevalent in certain circuits (e.g., between inhibitory interneurons) and in invertebrate nervous systems.

Edge weights

Most neuron pairs that are connected have multiple synapses. How to represent this?

Binary (unweighted): Edge exists (1) or doesn’t (0). Simplest representation. Loses information about connection strength.

Synapse count: Edge weight = number of synapses from A to B. The most common weighting scheme. Ranges: C. elegans typically 1-50 synapses per pair; Drosophila 1-100+; mammalian cortex 1-20+ for most pairs, with some pairs having >50.

Total contact area: Edge weight = sum of cleft areas or PSD areas across all synapses. More biologically meaningful (larger PSD ≈ stronger synapse) but harder to measure accurately.

Estimated strength: In rare cases, functional data (paired recordings, calcium imaging) can estimate synaptic strength. This bridges structure and function but is available for very few connections.

The threshold problem

A critical practical decision: at what minimum synapse count do you call two neurons “connected”?

Threshold = 1: Include all detected synapses. Maximizes sensitivity but includes many false positives (single-synapse connections are noisy and may be detection errors).
Threshold = 3-5: Common in published analyses. Reduces noise but may miss genuine weak connections.
No threshold: Use continuous weights (synapse count) and avoid binarizing.

The effect of thresholding is dramatic. In a typical cortical dataset:

At threshold = 1: ~500,000 edges
At threshold = 3: ~150,000 edges
At threshold = 5: ~60,000 edges

Degree distributions, clustering coefficients, and motif counts all change substantially with threshold. Every analysis must report its threshold and justify the choice.

Adjacency matrices

Definition

For N neurons, the adjacency matrix A is an N×N matrix where entry A[i,j] = the weight of the edge from neuron i to neuron j (0 if no connection).

Properties:

Directed graph: A is generally asymmetric (A[i,j] ≠ A[j,i] unless the connection is reciprocal with equal weight)
Sparse: Most entries are zero. In cortex, each neuron connects to <1% of its neighbors, so >99% of the matrix is zeros.
Row sums = out-degree (for binary) or total output weight
Column sums = in-degree or total input weight

Sparse representation

For 100,000 neurons, the full adjacency matrix has 10^10 entries — ~40 GB at 32-bit floats, mostly zeros. In practice, connectomes are stored as sparse matrices:

Edge list format: Three columns: source, target, weight. Only non-zero entries stored. Most compact for very sparse graphs.
Compressed Sparse Row (CSR): Efficient for row-wise operations (e.g., “find all outputs of neuron X”).
Compressed Sparse Column (CSC): Efficient for column-wise operations (e.g., “find all inputs to neuron X”).

Tools for graph manipulation

Tool	Language	Strengths
NetworkX	Python	Easy API, rich algorithms, good for <100K nodes
igraph	R/Python/C	Fast, good for medium graphs (<1M nodes)
graph-tool	Python/C++	Fastest for large graphs, excellent SBM implementation
scipy.sparse	Python	Direct sparse matrix operations, integrates with NumPy
Neo4j	Java/Cypher	Graph database, good for persistent storage and queries

Multigraphs and multi-layer networks

Multigraphs

Two neurons may be connected by multiple synapses. Representing each synapse as a separate edge creates a multigraph. This preserves spatial information (each synapse has a location on the pre and post neuron) but is more complex to analyze.

Common simplification: Collapse multigraph to weighted simple graph where weight = synapse count.

Multi-layer networks

Different connection types can be represented as separate graph layers:

Layer 1: Excitatory chemical synapses
Layer 2: Inhibitory chemical synapses
Layer 3: Gap junctions

Each layer may have different topology. Analysis can examine each layer independently or study inter-layer relationships.

Worked example: constructing a graph from a synapse table

Given: A synapse table from CAVE with columns: synapse_id, pre_segment_id, post_segment_id, synapse_type, cleft_area

import pandas as pd
import networkx as nx

# Load synapse table
synapses = pd.read_csv("synapses.csv")

# Filter to chemical synapses only
chem = synapses[synapses.synapse_type == "chemical"]

# Aggregate: count synapses per neuron pair
edges = chem.groupby(["pre_segment_id", "post_segment_id"]).agg(
    synapse_count=("synapse_id", "count"),
    total_cleft_area=("cleft_area", "sum")
).reset_index()

# Apply threshold
edges_filtered = edges[edges.synapse_count >= 3]

# Build graph
G = nx.DiGraph()
for _, row in edges_filtered.iterrows():
    G.add_edge(
        row.pre_segment_id,
        row.post_segment_id,
        weight=row.synapse_count,
        cleft_area=row.total_cleft_area
    )

print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")

Exercise: Re-run with thresholds of 1, 5, and 10. Plot the degree distribution at each threshold and observe how it changes.

Common misconceptions

Misconception	Reality	Teaching note
“The connectome is a fixed object”	Representation choices (threshold, weighting) create different graphs from the same data	Always report representational choices
“More edges = more accurate”	Low-threshold graphs include more noise from false synapse detections	Balance sensitivity and specificity
“Binary graphs are sufficient”	Synapse count carries biologically meaningful information about connection strength	Use weighted graphs when possible
“The adjacency matrix is the connectome”	The matrix is one representation; the underlying biology includes spatial structure, dynamics, and molecular identity	The graph is a model, not the territory

References

Rubinov M, Sporns O (2010) “Complex network measures of brain connectivity: Uses and interpretations.” NeuroImage 52(3):1059-1069.
Sporns O (2010) Networks of the Brain. MIT Press.
Varshney LR et al. (2011) “Structural properties of the Caenorhabditis elegans neuronal network.” PLoS Computational Biology 7(2):e1001066.
Dorkenwald S et al. (2024) “Neuronal wiring diagram of an adult brain.” Nature 634:124-138.
Scheffer LK et al. (2020) “A connectome and analysis of the adult Drosophila central brain.” eLife 9:e57443.