ICCS logo

GitHub · POPL 2022 paper · POPL talk

Fluid: data-linked visualisations

fluid

Understanding charts, diagrams and other visual information means understanding what visual elements represent: how specific visual elements relate to the data they were computed from. Fluid is an experimental programming language, implemented in PureScript, whose outputs are automatically linked to inputs in a fine-grained way. A Fluid program is a regular functional program, but instead of simply producing an opaque value, Fluid uses a bidirectional dynamic analysis to equip that value with information about how its parts relate to specific parts of the input. We can use that information to automatically provide additional UI affordances, allowing users to explores those connections without the programmer having to build those features in explicitly. For example if the user selects a bar in a bar chart computed from a table of data, we can automatically provide a view of the underlying table containing only the data needed to compute the selected bar.

data-linked visualisations

Try the data linking by selecting (clicking on) the bar for Germany in the left-hand chart below. When a view selection is active, we can automatically generate what we call a data selection: a restricted view of a dataset where only relevant rows are shown and relevant cells within those rows are highlighted. Now try selecting some of the data points for the years 2017 and 2018 in the right-hand chart. (Zoom in if it's hard to select the points.) The relationship between the output selection and the resulting input selection is monotonic: the more output you select, the more input is selected. (The UI doesn't support this yet, but it is also meaningful to select input and have the output selection be derived automatically.)

Hand-crafting transparency features like these would involve manually tracking the dependencies between input and output. Fluid does this automatically, allowing the programmer to simply express the bar chart and line chart as pure functions, using standard functional programming features such as higher-order functions and list comprehensions. Take a look at the source code below:

bar-chart.fld
line-chart.fld
renewables.fld

The mathematics of the bidirectional analysis is described by a Galois connection. This establishes that the forward and backward parts of the analysis are adjoint, meaning they almost invert each other in the following sense: the data selection computed by the backwards analysis for a given view selection, if fed into the forward analysis, will suffice for (at least) the original view selection. Similarly, the view selection computed by the forwards analysis, if fed into the backwards analysis, will require (no more than) the original data selection. (“Require no more, promise no less” is an apt slogan.) See our paper for more on the formal setting.

brushing and linking

Because every selection has a complement, the Galois connection has a De Morgan dual. The action of the dual on Galois connections is contravariant: it inverts the direction of the bidirectional dependency analysis, by transposing the roles of the two adjoints. Why invert something that is already bidirectional? Well, the De Morgan dual also changes the meaning of the analyses: under the dual interpretation, the backwards analysis propagates the absence of demand, rather than demand, and the forwards analysis propagates resource unavailability, rather than availability.

It turns out we can use this to link output selections to the related parts of other outputs generated from the same input, a feature is called brushing and linking in the data visualisation literature. Go back to the example above and select the bar for USA or China in the left-hand chart: all points relating to 2015 will be selected selected on the right. These are the parts of the right-hand chart which “compete for resources” with the selection on the left, the sense that they demand data which intersects in a non-empty way with the data demanded by the original selection. We compute the selection on the right not with the backwards analysis (which points in the wrong direction) nor with the forwards analysis (which computes the sufficiency relation), but instead with the De Morgan dual of the forwards analysis. The De Morgan dual of the forwards analysis determines the output on the right that we wouldn't be able to compute if the data demanded by the selection on the left were unavailable: in other words, it determines what the data selection is necessary for, rather than sufficient for. You can play this game in the other direction by selecting any point from 2015 on the right, which will highlight both USA and China on the left: again, because these bars both consume data which is also needed for the original point. (We could making this clearer in the UI by showing the data selection induced by USA and China, perhaps greyed out, and highlighting how it overlaps with the data needed by the original point.)

Brushing and linking is a common feature in data visualisation, but existing approaches are ad hoc and somewhat limited. Building it into a language has the potential to make it pervasive and robust. Moreover, we can always provide a view of the specific data elements which explain why the two visual elements are linked, providing another level of transparency.

infrastructure for
explorable explanations

Another future application is language infrastructure for explorable explanations, interactive web essays that explain challenging technical ideas. How might programming languages help with these? One idea would be a runtime which allows a user to drill down into a computed artefact such as an image, and explore the relationships between the stages of the computational pipeline using interactions like the ones above.

Consider the following implementation of matrix convolution, an image processing technique which transforms an input matrix using a small matrix called a filter (or a kernel) to apply effects like blurring or embossing. Trying selecting different cells in the output (the matrix on right) to see how they demand cells in the input array and filter:

In particular, notice how only certain parts of the input are relevant to a given output cell. Can some of the irrelevant inputs be attributed to zeros in the filter? Also notice how the demand varies as we approach the edge of the output image: because this implementation of convolution treats the input as though it were padded with zeros at the boundary, parts of the filter are irrelevant to the computation of output cells near the edge. (As mentioned above, it is not yet possible to select input elements and have the output selection be derived automatically.) Interactions like these would be more useful if we could show the actual computation involved for a given element, but even this simple extensional view already reveals interesting things about the implementation. Here is the pure functional code:

convolution.fld
emboss-zero-pad.fld

The library function convolve implements the convolution algorithm itself, and helper functions zero, wrap and extend implement specific policies (“methods”) for dealing with the boundary.

Contrast the example above, which uses zero as the boundary method, with one which uses wrap. Then selecting an output cell near the edge has a quite different behaviour: instead of some of the filter elements becoming irrelevant, we can see visually that the input image behaves as though opposite sides were connected. These examples show how making the input-output relationships fine-grained rather than monolithic reveals quite a lot about what a computation actually does.

emboss-wrap.fld

project details

Fluid is being developed by Roly Perera, Minh Nguyen and Joe Bond. Our pilot project was funded by The Alan Turing Institute's AI for Science and Government programme. Follow us on GitHub, or get in touch by email.