Method
BACHI performs symbolic chord recognition in two boundary-aware stages that mirror human ear-training.
- Patch Embedding & Transformer Encoder: Beat-synchronous piano-roll tokens are embedded on both piano-roll frame and time dimensions and processed by six transformer encoder layers.
- Boundary-Conditioned Context: A supervised chord boundary detector modulates encoder hidden states through FiLM and a local context window.
- Confidence-Ordered Decoding: A masked transformer decoder iteratively fills root, quality, and bass in confidence order, providing robust chord labeling.
Training Data: POP909-CL with human-corrected annotations and a curated classical corpus combining When-in-Rome and DCMLwith deduplication.