Module 12: Big Data in Connectomics
Teaching Deck
Learning Objectives
- Describe core architecture patterns for petascale connectomics data
- Plan compute, storage, and indexing strategies for large EM volumes
- Implement query workflows that preserve provenance and reproducibility
- Identify bottlenecks and failure modes in large-scale analysis pipelines
Session Outcomes
- Learners can complete the module capability target.
- Learners can produce one evidence-backed artifact.
- Learners can state one limitation or uncertainty.
Agenda (60 min)
- 0-10 min: Frame and model
- 10-35 min: Guided practice
- 35-50 min: Debrief and misconception correction
- 50-60 min: Competency check + exit ticket
Capability Target
Produce a scalable, reproducible query-and-analysis plan for a large connectomics dataset, including storage assumptions, indexing strategy, and provenance capture.
Concept Focus
1) Data architecture is scientific method infrastructure
- Technical: storage format, chunking, and indexing influence what questions are tractable.
- Plain language: bad architecture can make good science impossible.
- Misconception guardrail: compute scale alone does not solve poor data design.
Core Workflow
- See module page for details.
60-Minute Run-of-Show
-
**00:00-08:00 Architecture framing and failure examples** -
**08:00-20:00 Access-pattern to index mapping exercise** -
**20:00-34:00 Query profiling and bottleneck diagnosis** -
**34:00-46:00 Provenance logging implementation** -
**46:00-56:00 Team review of reproducibility gaps** -
**56:00-60:00 Competency check and next-step assignment**
Misconceptions to Watch
- Misconception guardrail: compute scale alone does not solve poor data design.
- Misconception guardrail: “it runs eventually” is not acceptable for iterative science.
- Misconception guardrail: notebook history alone is insufficient provenance.
Studio Activity
Activity Output Checklist
- Evidence-linked artifact submitted.
- At least one limitation or uncertainty stated.
- Revision point captured from feedback.
Assessment Rubric
- Minimum pass
- Query design matches analysis goal and data shape.
- Provenance requirements are explicit and actionable.
- Bottlenecks are identified with one realistic mitigation.
- Strong performance
- Separates exploratory and production query paths.
- Quantifies tradeoffs (latency, cost, reproducibility).
- Anticipates failure recovery and rollback needs.
- Common failure modes
- Index choices disconnected from query workload.
- Missing version metadata in outputs.
- Optimization attempts without benchmark baseline.
Exit Ticket
Document one query you use with:
- data source/version,
- expected runtime class,
- one provenance field you currently miss.
References (Instructor)
- H01 human cortical fragment release and infrastructure notes.
- MICrONS data platform documentation.
- Januszewski et al. (2018) for scalable reconstruction context.
Teaching Materials
- Module page: /modules/module12/
- Slide page: /modules/slides/module12/
- Worksheet: /assets/worksheets/module12/module12-activity.md