Blog
Notes from the team and stories from the community — what we shipped, how to use it, and what researchers have built with the platform. From scholarship journeys to benchmark methodology.
Where the milliseconds go: per-module latency on the benchmarks page
We just shipped a stacked latency breakdown for GBLUP and MLM-GWAS so you can see exactly which substep dominates each run.
Tweak a threshold, see the diff: server-side QC previews are live
Advanced QC now runs through the same server endpoint as the production pipeline, with snapshots you can diff between runs.
From scholarship to published GS model: Amira's first year on the platform
How a CGIAR scholarship recipient went from her first GBLUP run to a peer-reviewed maize prediction paper in eleven months.
Scholarships, embargoes, and how we keep the platform open to public breeders
Who qualifies, what's included, and how the access program funds compute time for students and public-sector labs.
Reading a Manhattan plot without lying to yourself
A practical guide to interpreting MLM-GWAS output: genome-wide significance, suggestive hits, LD blocks, and what FDR really means.
Why we publish benchmarks against rrBLUP, sommer, GAPIT and PLINK
Marketing numbers are useless without code, seeds and datasets. Here is how our benchmark suite is built and what it does and doesn't prove.
Crop Yield Prediction: from genotype × phenotype CSV to held-out R² in one upload
How the Yield module turns marker matrices and trait files into GEBVs, with held-out R², RMSE, marker effects, and a persisted results page.
Stress Resistance GWAS: per-marker regression with Bonferroni you can actually trust
The Stress Resistance module runs MLM-GWAS with kinship and PCA correction, then exports Manhattan-ready marker statistics and BH-adjusted q-values.
Breeding Optimization: ranking parents from real multi-trait CSVs
How the Breeding module validates trait weights, ranks candidate parents on a Smith-Hazel index, and predicts expected cross performance.
Environmental Intelligence: real weather, real GDD, no spreadsheets
The Environment module fetches historical weather and computes GDD, heat-stress days, dry days, and precipitation metrics for any trial site.
Genomic Selection: GBLUP, BayesB, and rrBLUP under one roof
The Genomic Selection page lets you compare prediction methods on the same fold partition, with paired CV correlations and runtime side by side.
Multi-trait Genomic Selection: borrow strength across correlated traits
Multi-trait GS uses genetic correlations between traits to improve prediction accuracy, especially for low-heritability or sparsely measured traits.
GxE and MET: decomposing G, E, and the interaction without an R script
The MET module fits multi-environment models with reaction-norm and factor-analytic structures and tells you how much variance is GxE.
Imputation: filling missing genotypes without inflating your accuracy
The Imputation module supports mean, kNN, and reference-based imputation, with honest reporting of imputation quality.
Data QC: MAF, call rate, HWE, heterozygosity, and contamination in one pass
The QC module runs the standard genotype QC suite with server-side previews, run-to-run diffs, and snapshots you can revisit.
Variant Annotation: from marker IDs to gene context
The Annotation module maps significant markers to gene models and predicts coding-variant effects against the reference annotation.
Population Structure: PCA and admixture without the command-line
Run a PCA and a model-based admixture estimate on your panel, then export the components for downstream GWAS correction.
Trial Design: alpha-lattice, augmented, and partially replicated layouts
Generate field-trial layouts that minimise spatial confounding, with seed-lot lists and per-plot CSVs ready for the planter.
Pangenome: structural variants beyond the reference
The Pangenome module brings presence/absence variants and large structural variation into your association and prediction workflows.
Crop Databases: synced reference data for the crops you care about
The Crop Databases page exposes reference panels, marker maps, and ontology trait codes, kept in sync via a scheduled job.
Bulk runs and Pipelines: chain QC → GS → GWAS in one click
Pipelines stitch the platform's modules into reusable workflows; the Bulk runner executes the same pipeline across many datasets in parallel.
Jobs, Queue, and Observability: knowing what your runs are doing
Realtime job progress, a transparent queue, and an observability page that surfaces wall-time, memory and quota usage per run.
Projects: scoping data, runs and collaborators
Projects group datasets, runs and members behind a single permission boundary, with per-project quotas and audit trails.
Publishing results: shareable pages with embargo and DOI
Publish a run to a public results page with one click, attach an embargo until your paper lands, and mint a DOI for citation.
API keys, SDK, and Webhooks: driving the platform from your code
Issue scoped API keys, call the platform from Python or TypeScript, and subscribe to webhook events when runs finish.
Billing, quotas and credits: how runs are metered
Every run debits a quota counted in credits; here's how credits are priced, refunded on failure, and split across pipeline steps.
Notifications and Ask: stay close to what's happening
In-app notifications for finished runs and quota events, plus an Ask panel that answers questions about your own datasets.