Fluid is an exciting opportunity to work on new programming language foundations designed to make data science more open, intelligible and accessible. If you are a University of Cambridge MPhil or Part III student in Computer Science starting in October 2025 and looking for a Masters project, then we may have an opportunity for you. Also check out our poster.
Charts and other visual summaries, curated by journalists and scientists from real-world data and simulations, are how we understand our changing world and the anthopogenic sources of that change. But the visual artifacts we are actually presented with are opaque: any relationship to the underlying data is lost. How can we expect to evaluate claims based on a bitmap? This is challenging enough for an expert with access to the source code and data used to derive the outputs; for a non-expert the prospects are even worse.
Fluid is a new “transparent” programming language, being developed at the Institute of Computing for Climate Science in Cambridge in collaboration with University of Bristol, that makes it easy to create charts and figures which are linked to data, enabling a user to interactively discover what visual elements actually represent. The key idea is to incorporating a bidirectional dynamic dependency analysis into the language runtime, allowing it to track dependencies that arise as as outputs (such as charts and tables) are computed from data. This information is then used to automatically enrich rendered output with interactions which allow a reader to explore the relationship to data directly through the artefact, by selecting visual features of interest. Fluid uses so-called “program slicing” techniques based on Galois connections, a neat mathematical abstraction which characterises exactly the relationship between sets of inputs and sets of outputs which depend on them.
A key use case for Fluid is to make it easy to present real-world climate science in “long-form” explorable essays and interactive articles intended for non-specialist audiences, such as policymakers. (See distill.pub for some examples of so-called “explorable explanations”.) There are two main technical goals driving the next iteration of the platform:
Computational explanations: information about the specific steps that were involved in computing a particular feature of the output (e.g. the whiskers decorating a bar in a bar chart). This is a potentially powerful transparency feature, allowing readers (perhaps during peer review) to discover otherwise hidden or obscured facts about the data underpinning a visualisation. You could work on the interpreter which generates the computational explanations in the first place, or on ML techniques for turning these computational explanations into more user-friendly natural language explanations. The novel contribution that Fluid can make to this latter problem is to provide an authoritative ground truth for the generation of the natural language, offering the prospect of a “trusted” or reliable form of open, self-explanatory artifact. (See “AI reading assistant” in the poster.)
Transparent text: natural language (such as the expository text in a climate report for policymakers) in technical contexts often has a semi-formal computational interpretation, most obviously quantitative phrases or other fragments of text expressing data-driven claims. For example, the statement that under a particular emissions scenario, global warming is extremely likely to exceed 2°C in the 21st century can be underwritten by a Fluid program that assigns a specific interpretation to this text in terms of the distributions of the underlying data used (by the report author) to reach that conclusion. You could help develop some of the infrastructure needed here, or AI tooling which replaces fragments of text by expressions that compute that text from data. (See “AI authoring assistant” in the poster.)
The live demos on the Fluid website only scratch the surface of what transparent programming languages like Fluid should enable when fully realised. There are many opportunities for an imaginative and technically strong student to help move this idea forward as part of an MPhil project; your project will live somewhere at the intersection of programming languages research, AI/ML, HCI, NLP and data science, depending on your skills/background and research interests.
If this sounds interesting, please get in touch with Dr Roly Perera, Department of Computer Science and Technology, University of Cambridge to arrange an initial chat. Whatever form your project takes, we would aim for your work to be incorporated into one or more of our current research outputs, and so would form a genuine contribution to the research. A strong background in some combination of functional programming, maths and data science is a must. You can expect to gain experience in programming languages research, data analysis and data visualisation, with close supervisor collaboration.