Fluid is an exciting opportunity to work on new programming language foundations designed to make data science more open, intelligible and accessible. If you are:
then we may have an opportunity for you; see the specific schemes below. Also check out our poster.
Charts and other visual summaries, curated by journalists and scientists from real-world data and simulations, are how we understand our changing world and the anthopogenic sources of that change. But the visual artifacts we are actually presented with are opaque: any relationship to the underlying data is lost. How can we expect to understand, critique or evaluate claims based on a bitmap? This is challenging enough for an expert with access to the source code and data used to derive the outputs; for a non-expert the prospects are even worse.
Fluid is a new “transparent” programming language, being developed at the Institute of Computing for Climate Science in Cambridge in collaboration with University of Bristol, that makes it easy to create charts and figures which are linked to data, enabling a user to interactively discover what visual elements actually represent. The key idea is to incorporating a bidirectional dynamic dependency analysis into the language runtime, allowing it to track dependencies that arise as as outputs (such as charts and tables) are computed from data. This information is then used to automatically enrich rendered output with interactions which allow a reader to explore the relationship to data directly through the artefact, by selecting visual features of interest. Fluid uses so-called “program slicing” techniques based on Galois connections, a neat mathematical abstraction which characterises exactly the relationship between sets of inputs and sets of outputs which depend on them.
A key use case for Fluid is to make it easy to present real-world climate science in “long-form” explorable essays and interactive articles intended for non-specialist audiences, such as policymakers. (See distill.pub for some examples of so-called “explorable explanations”.)
We have up to 4 summer internship positions available, funded through the following schemes, with at least one more position opening later in the Spring. These are all widening participation internships, meaning you must meet at least one of the socio-economic or underrepresentation criteria listed on the web page for the particular scheme:
You must also be in your penultimate or final year of your undergraduate degree and on course for a 1st or 2:1.
Your internship project will live somewhere at the intersection of programming languages research, AI/ML and HCI, depending on your skills/background and research interests. There are two main goals driving the next iteration of Fluid; both involve using (black-box) LLMs to achieve goals relating to transparency and open science.
A) Extending Fluid with computational explanations: information about the specific steps that were involved in computing a particular feature of the output (e.g. the whiskers decorating a bar in a bar chart). This is a potentially powerful transparency feature, allowing readers (perhaps during peer review) to discover otherwise hidden or obscured facts about the data underpinning a visualisation. One internship project will involve turning these computational explanations into more user-friendly natural language explanations that would be useful for lay readers as well as expert readers. The novel contribution that Fluid can make to this problem is to provide an authoritative ground truth for the generation of the natural language, offering the prospect of a “trusted” or reliable form of open, self-explanatory artifact. (See “AI reading assistant” in the poster.)
B) Extending Fluid with transparent text: natural language (such as the expository text in a climate report for policymakers) which is underwritten by a semi-formal computational interpretation, especially quantitative phrases or other fragments of text expressing data-driven claims. For example, the statement that under a particular emissions scenario, global warming is extremely likely to exceed 2°C in the 21st century can be underwritten by a Fluid program that assigns a specific interpretation to this text in terms of the distributions of the underlying data used (by the report author) to reach that conclusion. Another internship project will be to develop AI tooling which replaces fragments of text by expressions that compute that text from data. (See “AI authoring assistant” in the poster.)
University of Cambridge MPhil or Part III students starting October 2025 may be interested in the following opportunity.
The live demos on the Fluid website only scratch the surface of what transparent programming languages like Fluid should enable when fully realised. There are many opportunities for an imaginative and technically strong student to help move this idea forward as part of an MPhil project. Your research could go in a number of directions, depending on whether your interests lie more towards programming languages, AI/ML or HCI. A programming languages project would extend Fluid into a literate programming tool, by adding Markdown support and the ability to embed computational content via a Lisp-style backquote mechanism. A more mathematical project might add multidimensional arrays to the language, along with various array operations inspired by linear algebra and an extension of the dependency analysis to these new operations. A project focused on AI/ML might focus on generating natural language explanations from provenance traces, or interpreting fragments of natural language as having a formal underpinning in an evidence base.
If you think any of this sounds interesting, please get in touch with Dr Roly Perera, Early Career Advanced Fellow, Institute of Computing for Climate Science, University of Cambridge to arrange an initial chat. Whatever form your project takes, we would aim for your work to be incorporated into our main development codebase, and so would form a genuine contribution to the overarching project. You will get to present your work to researchers and data scientists at the Institute of Computing for Climate Science and The Alan Turing Institute, and work with PhD students at Cambridge and Bristol. A strong background in functional programming, maths and/or science is a must. You can expect to gain experience in programming languages research, data analysis and data visualisation, with close supervisor support.