Final Recomputation & Visualization Regeneration

Current Status

✅ Phase Complete: Enrichment & Validation

📊 Corpus Ready for Final Recomputation

Output: paper_rankings_all_final.json (with fresh metrics)

Output: author_rankings_final.json (35,641 authors)

field_map_full.html — Citation network (7,503 nodes, 85K edges)
- Force-directed layout with zoom/pan
- Ranked list (all papers, sortable)
- Toggle citation/co-authorship networks
coauthor_map_full.html — Co-authorship network (35,641 nodes)
- Force-directed layout
- Ranked author list
career_arcs_plot.html — Interactive visualization (NEW)
- Timeline: publications per year × expert
- Color by paper role/importance
- Hover: citation impact trajectory
journal_club_threshold_*.html (4 versions)
- Refresh with updated metrics
- Threshold 10, 15, 20, 30
kcore_map.html — K-core shells
- Updated k-values and coloring
evolution_graph_full.html — Field timeline
- Verify data freshness
index.html — Dashboard (updated)
- Link to all visualizations
- Updated statistics

Output: 4 sets of files (threshold_10, 15, 20, 30)

BIBLIOGRAPHY_ANALYSIS_DOCS.md
- Verify corpus size: 7,503 papers
- Verify author count: 35,641 (after merges)
- Update metrics explanations
- Document career arc visualization
METHODOLOGY_AND_PIPELINE.md (NEW)
- Document full pipeline flow
- Explain metric computations
- Discuss thresholds and choices
- Flag limitations & biases
- Prepare for critical review

Before considering this “ready for review”:

Recommended sequence:

Estimated scope: 6-8 discrete computational tasks, some parallelizable

Once this recomputation is complete and verified, separate work session to:

Critical review of thresholds
- In-degree/out-degree cutoffs (10 for each direction — too strict? too loose?)
- Composite score formula (80% PageRank + 20% k-core — still optimal?)
- Journal club thresholds (0.10, 0.15, 0.20, 0.30 — right choices?)
Methods paper weighting
- Current: out-degree ≥37 (top 5%)
- Consider: boost formula for infrastructure papers?
- Consider: temporal dynamics (older methods vs. newer)?
Community detection robustness
- Louvain algorithm choices (randomness, resolution)
- Alternative: other clustering methods?
- Validation: compare to hand-labeled communities
Author merging completeness
- Are there other Gray-Roncal-like merges we missed?
- Automated name similarity scoring?
Graph properties
- Citation network: power-law? small-world?
- Co-authorship: assortative? community structure?

Once locked and verified, can tackle:

Ready to execute final recomputation pass?