In recent years, podcasts have rapidly emerged as a popular medium worldwide. In addition to their easy accessibility online, podcasts are characterised by an intimate, aural delivery of their narrative content, creating exciting opportunities for digital scholarship into storytelling. Computational literary studies, however, still remain (hyper)focused on written text, in spite of various calls for more multimodal research. This paper documents the construction of an open-source dataset of textual transcriptions, drawn from a large, representative sample of contemporary, English-language podcasts. The dataset covers a total of 412 days of audio content, transcribed and made available as bag-of-words frequency tables. By making these materials accessible, I hope to stimulate future study of podcasts using techniques from (computational) literary studies. Preliminary experiments suggest that podcast categories in particular offer a fruitful avenue to explore the various textual modes in which podcast producers successfully cater for a diverse global audience.
