Targeted engineering of plant genomes holds great promise for ensuring food security and for producing biopharmaceuticals. However, this engineering requires thorough knowledge of cis-regulatory elements to precisely control endogenous and introduced genes. To generate this knowledge, we established plant STARR-seq, a massively parallel reporter assay, which can measure the condition-specific activity of hundreds of thousands of putative regulatory elements.
We used plant STARR-seq to characterize over 75,000 promoters from Arabidopsis, maize and sorghum. We demonstrate that core promoter elements as well as GC content and transcription factor binding sites influence promoter strength. By performing the experiments in two assay systems, leaves of the dicot tobacco and protoplasts of the monocot maize, we detect species-specific differences. Using these observations, we built computational models to predict promoter strength in both assay systems, allowing us to design highly active synthetic promoters comparable in activity to the viral 35S minimal promoter.
We recently assessed enhancer activity for over 175,000 accessible chromatin regions from Arabidopsis, tomato, maize and sorghum, in addition to testing almost 1,000,000 random sequences. We show that enhancers are orientation-independent and that their strength is determined by transcription factor binding sites and GC content. By testing these elements in different environmental conditions, we identify both constitutive and condition-specific enhancers and determine the features that are responsible for their activity. Using this data, we have trained computational models to accurately predict and design condition-specific enhancers. Notably, our previous promoter models have little power to predict enhancer activity, and promoter strength alone was not predictive of endogenous gene expression. We are currently combining the knowledge we have gained for both element types to derive models that can predict endogenous gene expression, identify promising targets for genome engineering, and predict the outcome of genomic edits. Together with novel plant terminators and insulators – whose activity we are presently measuring – this comprehensive strategy will enable us to build tunable and programmable multi-gene cassettes that encode metabolic pathways to produce valuable bioproducts in crops.