Fluid is an exciting opportunity to work on new programming language foundations designed to make data science more open, intelligible and accessible. If you are a University of Cambridge MPhil or Part III student in Computer Science looking for a project this October, and would like to discuss the possibility of working on Fluid, please contact Dr Roly Perera, Early Career Advanced Fellow, Institute of Computing for Climate Science, University of Cambridge.
Charts and other visual summaries, curated by journalists and scientists from real-world data and simulations, are how we understand our changing world and the anthopogenic sources of that change. But the visual artifacts we are actually presented with are opaque: any relationship to the underlying data is lost. How can we expect to understand, critique or evaluate claims based on a bitmap? This is challenging enough for an expert with access to the source code and data used to derive the outputs; for a non-expert the prospects are even worse.
Fluid is a new “transparent” programming language, being developed at the Institute of Computing for Climate Science in Cambridge in collaboration with University of Bristol, that makes it easy to create charts and figures which are linked to data, enabling a user to interactively discover what visual elements actually represent. The key idea is to incorporating a bidirectional dynamic dependency analysis into the language runtime, allowing it to track dependencies that arise as as outputs (such as charts and tables) are computed from data. This information is then used to automatically enrich rendered output with interactions which allow a reader to explore the relationship to data directly through the artefact, by selecting visual features of interest. Fluid uses so-called “program slicing” techniques based on Galois connections, a neat mathematical abstraction which characterises exactly the relationship between sets of inputs and sets of outputs which depend on them.
The live demos on the website show the interactive queries we currently support, but these only scratch the surface of what this kind of infrastructure should make possible. There are many opportunities for an imaginative and technically strong student to help move this idea forward. Your project could go in a number of directions, depending on whether your interests lie more towards programming languages, formal methods or data science. A programming languages project would extend Fluid into a literate programming tool, by adding Markdown support and the ability to embed computational content via a Lisp-style backquote mechanism. A more mathematical project might add multidimensional arrays to the language, along with various array operations inspired by linear algebra and an extension of the dependency analysis to these new operations. A project focused more around science communication would use Fluid to adapt a piece of real-world climate science into a “long-form” essay or interactive explanation intended for a non-specialist audience. (See distill.pub for some examples.)
If you think this sounds interesting, please get in touch to arrange an initial chat. Whatever form your project takes, we would aim for your work to be incorporated into our main development codebase, and so would form a genuine contribution to the overarching project. You will get to present your work to researchers and data scientists at the Institute of Computing for Climate Science and The Alan Turing Institute, and work with PhD students at Cambridge and Bristol. A strong background in functional programming, maths and/or science is a must. You can expect to gain experience in programming languages research, data analysis and data visualisation, with close supervisor support.