The exponential growth of DNA sequencing data calls for efficient solutions for storing and querying large-scale k-mer sets. While recent indexing approaches use spectrum-preserving string sets (SPSS), full-text indexes, or hashing, they often impose structural constraints or demand extensive parameter tuning, limiting their usability across different datasets and data types. Here, we propose FMSI, a minimally parametrized, highly space-efficient membership index and compressed dictionary for arbitrary k-mer sets. FMSI combines approximated shortest superstrings with the Masked Burrows-Wheeler Transform (MBWT). Unlike traditional
methods, FMSI operates without predefined assumptions on k-mer overlap patterns but exploits them when available. We demonstrate that FMSI offers superior memory efficiency for processing queries over established indexes such as SSHash, Spectral Burrows-Wheeler Transform (SBWT), and Conway-Bromage-Lyndon (CBL), while supporting fast membership and dictionary queries. Depending on the dataset, k, or sampling, FMSI offers 2–3x space savings for processing queries over all state-of-the-art indexes; only a space-optimized SBWT (without indexing reverse complement) matches its memory efficiency in some cases but is 2–3x slower. Overall, this work establishes superstring-based indexing as a highly general, flexible, and scalable
approach for genomic data, with direct applications in pangenomics, metagenomics, and large-scale genomic databases.