Content area
Abstract
ABSTRACT
The volume of scientific data produced for and by numerical simulation workflows is increasing at an incredible rate. This raises concerns either in computability, interpretability, and sustainability. This is especially noticeable in earth science (geology, meteorology, oceanography, and astronomy), notably with climate studies. We highlight five main evaluation issues: efficiency, discrepancy, diversity, interpretability, availability. Among remedies, lossless and lossy compression techniques are becoming popular to better manage dataset volumes. Performance assessment—with comparative benchmarks—requires open datasets shared under FAIR principles (Findable, Accessible, Interoperable, Reusable), provided in a MWE (Minimal Working Example) with ancillary data for reuse. We share
Full text
- CPG
- Corner Point Grid
- FAIR
- Findable, Accessible, Interoperable, Reusable
- FVF
- Formation Volume Factor
- GRDECL
- Grid Eclipse
- HPC
- High Performance Computing
- HS
- HexaShrink
- MWE
- Minimal Working Example
Abbreviations
Introduction
Science has entered the ‘fourth paradigm’ of data-intensive computing for discovery (Hey et al. 2009). Increasingly accurate models yield unprecedented access to more precise simulations, resorting to high-performance computing (HPC) facilities. The exploitation of massive datasets is however hampered by many size-related issues, such as storage, memory, workflow management, and visualisation (Ahrens 2022; Sarton et al. 2023), for instance for machine learning and artificial intelligence tasks (Underwood et al. 2024). As a result, data compression is making a comeback from an influential 1990's multimedia era1 to the many worlds of modelling and simulation. At stake are legal long-term storage issues for instance in climate modelling (Overpeck et al. 2011), checkpoint restart or snapshotting for fault tolerance in HPC (Yakushin et al. 2020), approximate computing (Mittal 2016), faster selection of parameters with smaller simulation models, progressive result retrieval (Magri and Lindstrom 2024), in situ/in storage processing (Childs et al. 2020), objective and subjective performance evaluation, etc.
For a comprehensive evaluation of compression performance and challenges in modelling (Schweiger et al. 2020; Underwood et al. 2024), several issues deserve attention, sometimes in contrast to what was held true for multimedia data coding. We thereafter detail the five most prominent issues: efficiency, discrepancy, diversity, interpretability, availability, before we inflect them to geological models in Section 2, thereby motivating the Lundisim mesh dataset (Duval et al. 2025a) detailed in the remainder of the paper. It is available at 10.5281/zenodo.14641958.
Issue 1: Efficiency
Perfect or lossless compression like ‘zip’ notoriously yields very limited reduction ratios. More than two to three-fold reduction in size is rare (except for highly-structured data). Therefore, approximate, near-lossless, progressive or lossy compression algorithms are required to ensure a significant byte size reduction compatible with the pressing needs occurring from gigantic simulation volumes. They however entail careful assessments of the data loss impact on performance (Murillo et al. 2002; Taurone et al. 2023; Walters and Wong 2023; Wang et al. 2024), especially for bounding errors (Liang et al. 2022; Liu et al. 2024).
Issue 2: Discrepancy
Simulation datasets are typically very heterogeneous. Coming from different sources, at various workflow steps, they include structured, semi-structured, and unstructured data, are of various dimensionality (1D, 2D, 3D and 4D) and stored in various formats (booleans, discrete labels, integers, floats, etc.) or containers (json, HDF, netCDF, FITS, Mexus, ASDF, XLM). They cannot be addressed with generic compression tools easily, nor with optimal performance. They require dedicated algorithms, taking into account complicated data morphologies (Klöwer et al. 2021).
Issue 3: Diversity
Mixed and high-dynamics data amalgamate a huge diversity of types, ranges, or statistical distributions, from a handful of finite nominal categories (Likert scale, data labels, attributes) to high-precision values (covering several range scales) with unbalanced histograms (Underwood et al. 2023). At stake here are quantities whose variations have highly non-linear behaviour or non-proportional effects on post-processing or workflows. For instance, small values that would be discarded with traditional lossy compression may need to be faithfully preserved.
Issue 4: Interpretability
Direct interpretation of the different—and often visually combined—types of scientific data (Baker et al. 2016) is less straightforward than with standard audio, image, or video (Poppick et al. 2020). First, it is heavily coupled with physical modelling. Second, models potentially undergo long-lasting simulations whose outputs are subject to a host of objective and subjective assessments. Simulation evaluations gather teams with diverse skills. Their expertise is deployed iteratively at different stages of the workflow. Owing to simulation complexity and compression recency, overall quality assessment is restricted to a small number of individuals from distinct backgrounds, with little universally accepted metrics and huge policy options. Acceptable objective losses with no influence on simulation may become unacceptable to an expert subjective interpretation, focusing on specific modalities. In contrast, the knowledge of the human sensory systems and the world-wide dissemination of multimedia devices allowed the persistence, over decades, of widely accepted compression of audio and visual contents.
Issue 5: Availability
Open availability of representative models, in FAIR principles (Peters et al. 2020), is not granted for proprietary uses or ad-hoc data manipulation that cannot be reproduced. One of the co-authors of this paper has, for instance, encountered apparently huge proprietary meshes which, candidates for potential challenges, showed up to have been artificially inflated by linear interpolation on mid-scale data. This may jeopardise fair compression evaluation, as data becomes highly predictable. Therefore, openly shared geological models that may be modified to adapt to different simulation contexts are convenient.
Geological Model Issues
Aside from gigantic climate-related models, geoscience somehow lacks open, manageable, heterogeneous data models that can be embedded in a processing or simulation workflow. In a similar way to recent initiatives (Alumbaugh et al. 2023; Haehnel et al. 2023), we share with this paper a handful of 3D geological models and their multiscale representations. Developed at IFPEN during Lauriane Bouard's PhD thesis (Bouard 2021), this dataset is collectively denoted as Lundisim. Lundisim is dedicated to performance evaluation and benchmarks around the compression of 3D geological models (Wellmann and Caumon 2018) targeted to simulation workflows, illustrating the five previously raised issues. We named our dataset Lundisim, after the Icelandic name of the (peaceful) Atlantic puffin. This name is a friendly nod to two protagonistic simulation software suites, Petrel and SKUA, named after two (highly competing) seabirds. Lundisim was initially created for testing HexaShrink (HS) (Peyrot et al. 2019), a scalable storage and multiresolution (also called hierarchical (Abraham and Celes 2019; Ceballos et al. 2021; Devarajan et al. 2020; Suter et al. 2019)) visualisation framework for hexahedral meshes with mixed attributes and discontinuities. HS was then integrated into a comprehensive compression workflow, enabling progressive and refinable data representation of composite hexahedral meshes. ‘Composite’ here means that the 3D geometric structure (or grid) may itself be encoded by complementary spatial locations. In computational geology, this geometry is traditionally structured by a Corner Point Grid (CPG): a 1D coordinate system along the vertical direction (‘Pillar’) supports a more horizontal 2D layering (‘Zcorn’). This grid may be complemented with numerical properties (porosity, expressed in unit proportion; permeability given in millidarcy or md in the following) and discrete categories (cell activity, rock type) designed from rock physics, for flow simulation in reservoir modelling engineering. There, a geological model may be filled by different stochastic distributions. They account for phenomena representing variations in the underground. Once filled with properties, a reservoir model is simulated under varying operating conditions. Such simulations are used to gain insight on how to manage a storage or production facility on a day-to-day basis.
In (Peyrot et al. 2019), we observed that different data in composite meshes distinctly affect compression algorithms. For the sake of completeness, we provide here an illustrative example, based on one of our Lundisim models, described thereafter. The dark blue bars in Figure 1 represent the ‘raw’ number of bits per symbol for various data types in a model cell (i.e., Zcorn, Pillar, Activity, Porosity and Permeability). The orange bar depicts the direct application of the generic yet highly performant lossless LZMA coder (Lempel-Zip-Markov chain Algorithm (Salomon and Motta 2009, section 6.26)) on all components, with mild average compression (see Issue 1, efficiency, from Section 1). More specifically, we observe that heterogeneous data types have distinct compression ratios (Issue 2, discrepancy). Boolean Activity property is easily compressed, while Permeability is more challenging. As in (Peyrot et al. 2019) we sought at the same time both lossless compression and the possibility to address mesh multiresolution, the grey, yellow and light blue bars of Figure 1 indicate the LZMA performance after respectively one, two or three levels of multiscale (HS 1, HS 2, HS 3 respectively) decompositions of all properties.
[IMAGE OMITTED. SEE PDF]
As seen per the average number of bits per symbol, integer geometry (Zcorn and Pillar) is increasingly compressed (though mildly) with resolutions, while the effect on continuous scalar properties (Porosity, Permeability) is slightly degraded due to the high dynamics of the data (Issue 3, diversity).
Above observations were made on losslessly compressed data. In other terms, decompressed data is faithful to the raw model, hence does not hamper workflow precision, notably in a context of simulation. However, simulation practice often resorts to data at coarser resolutions, for speedups and multi-scenario evaluations. Plus, it is well known that different data resolutions (scales) or precision (byte-per-symbol) may subjectively impact a simulation workflow (Issue 4, interpretability). In a typical compress-once/decompress-many context, one may need for instance to address objective mesh size and decompression speed metrics at the beginning of the workflow, and more subjective replays of flow propagation for post processing. Therefore, our Lundisim dataset contains models at four different levels of resolution to address Issue 5 (availability).
The remainder of the paper is organised as follows. We provide contextual information on reservoir modelling for simulation in Section 3, inspired by the well-known reservoir engineering challenge SPE10. We craft the two main components of Lundisim in Section 4: the common model mesh (Section 4.1) and its SPE10-inherited physical properties (Section 4.2). Ancillary data for simulation are provided in Section 5: global reservoir characteristics (Section 5.1) to allow simulation workflow reproduction (MWE, up to software suite characteristics); the application to fluid production (Section 5.2) with traditional simulation observables. Section 6 details data availability and associated software. The potential reuse of Lundisim and its limits are given in Section 7, before conclusions (Section 8).
Reservoir Modelling for Simulation
We base our work on a previously published challenge known as SPE10, that is, the Tenth SPE Comparative Solution Project (Christie and Blunt 2001) for reservoir simulation. We consider its second problem called Model 2, part of the Brent sequence (quoting), ‘a waterflood of a large geostatistical model chosen so that it was hard (though not impossible) to compute the true fine-grid solution’ (Christie and Blunt 2001, p, 308). In this challenge, eight companies competed to obtain the best possible outcome in the evaluation of this model, using a combination of simulation software and upscaling techniques. Counter-intuitively (since data science is more acquainted with downsampling or downscaling), in the context of reservoir simulation, upscaling and upgridding denote the process with which a fine-scale geological model (a grid assorted with rock properties such as porosity and permeability data) is converted into coarser models that are more computationally tractable, while providing outcomes as close as possible to those expected from the finer grid. Cells of the coarser grid (upgridding (King 2007)) are filled with equivalent properties (upscaling) obtained from finer-resolution cells, using a variety of homogenization or averaging techniques. We refer to (Christie and Blunt 2001; Misaghian et al. 2018; Preux 2014) for details.
Upscaling thus reduces the original grid size as well as cell-borne quantities. This results in a global reduction of the size of data with heterogeneous properties, a process similar to what is targeted in genuine data compression, where the modification of data resolution is combined with variations in data precision and additional entropy coding schemes that yield a final compressed file. We refer to (Salomon and Motta 2009) for advanced notions in data coding. Meanwhile, one may ask whether suitable data compression, adapted to geological data, is compatible and even maybe beneficial to flow simulation of large heterogeneous models, as partly exposed in (Bouard et al. 2021; Bouard 2021) (whose outcomes are not required here for further understanding). We now focus on Lundisim benchmark models.
Lundisim
Lundisim Model Mesh
Figure 2 provides an overview of the model hexahedral mesh underlying all Lundisim models. This mesh bears a geological morphology similar to SPE10 dataset 2 (quoting): a ‘simple geometry, with no top structure or faults’. It mainly differs in its lengths in each dimension (chosen as powers of two) and the addition of faults, which are more challenging for upscaling/upgridding, multiscale decomposition, mesh compression (as vertices are not conform) and flow simulation (as faults affect fluid displacement).
[IMAGE OMITTED. SEE PDF]
The topography of Lundisim models stems from a realistic reservoir engineering case. It forms one quarter of an anticline structure (Figure 3), common in hydrocarbon trap reservoir study. The highest point () corresponds to the top of the anticline (3360 m depth). The opposite corner is situated 50 m below, on the same horizon.
[IMAGE OMITTED. SEE PDF]
Lundisim model mesh contains three continuous vertical stair-step faults. Two are apparent in red within Figure 2, the third one bulging from the top-right side. They are not aligned along grid axes and possess different offsets to emulate mildly complex environments. Its structure is composed of cells to allow reasonable simulation times—72 h with commercial software PumaFlow on the example given in Section 5.2—yet long enough for calling to upscaling when performed on a day-to-day basis. The average cell size represents a volume of size , which is common in sedimentary geology for modelling horizontal fine deposits of geologic material over the years. The numbers of cells (128, 128 and 32) in each dimension are powers of two (). This choice allows to implement the most standard dyadic computations, subsampling or decompositions, to better benchmark compression methods. Indeed, the latter often use fast algorithms or hardware acceleration performed on data chunks. Power-of-two dimensions therefore allow to better focus on compression impacts freed for uneven size overheads. This choice allows to scale the mesh dimensions by five scales, down to LEGO brick sizes. Note that size reduction by one or two dyadic scales often suffices. Therefore in practice, non-dyadic dimensions may be handled by only padding cells to the next even or quadruple integer, or using activity labels.
Lundisim Model Properties
The mesh is enhanced by two continuous petrophysical properties, porosity and permeability, required for the simulation benchmark, partly presented in (Bouard et al. 2021). The spatial distributions of those properties are inspired by two geological formations in (Christie and Blunt 2001): Ness2 and Tarbert.3 Note that we do not consider here rock types: though they are important in overall compression schemes (Peyrot et al. 2019), they were not required for our flow simulation purpose. As there is no obvious mapping from one geological object to another, we draw four different stochastic realisations to emulate four distinct environments, from homogeneous to anisotropic, which are displayed in colour scales in Figure 4.
[IMAGE OMITTED. SEE PDF]
The three first correspond to prograding nearshore environments (Tarbert formation) with smooth property variations: nearshore0 and nearshore1 have been generated by an isotropic distribution with different ranges of dependence, while nearshore2 exhibits more anisotropy. The fourth model, named fluvial (Upper Ness formation), exhibits sharper contrasts, with distinctive heterogeneous geological objects. This discrepancy between environments emulates a wide range of petrophysical system behaviours. Lundisim meshes are available at 10.5281/zenodo.14641958.
The conception of the initial common grid, the inclusion of faults and property filling have been performed with Paradigm 3D geological modelling software GOCAD (formerly known as GeOCAD, Geological Objects Computer-Aided Design; now SKUA)4 and the MATLAB Reservoir Simulation Toolbox (MRST) (Lie 2019).
Simulation Settings
In the absence of a common data exchange format for simulation platforms, we provide ancillary simulation data directly borrowed from the SPE10 Challenge, both as global characteristics and tables. It allows reservoir engineers to mimic our workflow or to adapt it to their own simulation tools. In addition to Lundisim models (provided in the de facto ECLIPSE standard GRDECL), we hence offer a Minimal Working Example (MWE). This baseline simulation workflow ensures a fair comparative benchmark when used on all models, for all the compression methods. Precise outcomes may of course depend on software and expert decisions.
Global Reservoir Characteristics
As for the reservoir model, the rock compressibility is set to and the reservoir pressure is set to 200 bar at the water–oil contact, fixed at a depth of 3410 m. Finally, the reservoir temperature is set to 60°C.
The simulation workflow is backed on the test case of a so-called black-oil model (Abraham and Celes 2019). It consists of two liquid phases: water and dead oil (with no gas dissolved). We introduce the Formation Volume Factor (FVF) quantity: ratio of volumes occupied by a fluid at reservoir conditions versus surface conditions. Quantities below are again borrowed from SPE10, and recalled for completeness. For water, viscosity pressure, density and FVF are computed by correlation from the reservoir simulator Pumaflow.5 For oil, some quantities are given by tabulations. The viscosity pressure (in centipoise, cP) is computed from Table 1, the density is set to , and the oil FVF () is tabulated in Table 2.
TABLE 1 Oil viscosity in centipoise (cP), tabulated as a function of pressure.
| Pressure (bar) | Viscosity (cP) |
| 50 | 2.85 |
| 200 | 2.99 |
TABLE 2 Oil Formation Volume Factor (FVF) , tabulated as a function of pressure.
| Pressure (bar) | |
| 50 | 1.05 |
| 200 | 1.02 |
| 500 | 1.01 |
We now turn to water/oil mixture characteristics with relative permeabilities for water () and oil (), respectively. Given that the latter is obtained from the former by , we tabulate relatives permeability curves in Table 3, for water () and oil (). Here, the irreducible water saturation is (Table 3, top of first column) and the residual oil saturation is (complement to the given in Table 3, bottom of first column).
TABLE 3 Relative permeability curves tabulation for water () and oil () as a function of water saturation ().
| 0.200a | 0.0000 | 1.0000 |
| 0.250 | 0.0069 | 0.8403 |
| 0.300 | 0.0278 | 0.6944 |
| 0.350 | 0.0625 | 0.5625 |
| 0.400 | 0.1111 | 0.4444 |
| 0.450 | 0.1736 | 0.3403 |
| 0.500 | 0.2500 | 0.2500 |
| 0.550 | 0.3403 | 0.1736 |
| 0.600 | 0.4444 | 0.1111 |
| 0.650 | 0.5625 | 0.0625 |
| 0.700 | 0.6944 | 0.0278 |
| 0.750 | 0.8403 | 0.0069 |
| 0.800b | 1.0000 | 0.0000 |
Application to Fluid Production
We finally present a typical two-phase flow simulated on Lundisim (full resolution, nearshore0 environment). Initially, two phases in the reservoir are horizontally stratified, with oil above water. The two wells are drilled in the whole depth of the reservoir. At , water is injected by in the lower part of the reservoir (Figure 3). The water pressure pushes the oil through the reservoir up to the producer (distant from 300 m). Injector pressure and producer rate remain constant, respectively set at 300 bar and per day.
One valuable indicator in oil production to determine field exploitation is the estimated watercut, that is, the ratio between water and total liquid volumes, at the producer well. It is regularly recorded over a period of time expressed in days (Figure 5). The inflection point (red point on the curve) is the water breakthrough, which denotes the water arrival at . From that instant, the extracted liquid contains more and more water. To avoid an expensive post-processing and optimise the exploitation configuration, reservoir engineers aim to delay this instant. Simulation is a powerful tool to estimate such predicted watercut curves over time. To determine the best exploitation configuration, many simulations with varying parameters can thus be run until satisfaction, involving very long computation times (Bouard et al. 2019).
[IMAGE OMITTED. SEE PDF]
This example was simulated at full scale in 72 h with proprietary software PumaFlow. An overview of the overall simulation, from fluid displacement to watercut construction, with a handful of compression-related impacts, is provided in the video added as Supporting Information, available online ().
Data Format and Access
The four Lundisim models presented in this article (one per environment) are provided as ‘Grid Eclipse’ (GRDECL) data, a de facto standard for grids with hexahedral cells, developed by Schumberger for the ECLIPSE Reservoir Simulator.6 They are available at Zenodo.7 Lower resolutions of Lundisim, produced by HexaShrink, are also available.
Lundisim illustrations from Figures 2 and 4 were made with ResInsight8 (v.2023.069), an open source cross-platform 3D visualisation and post-processing tool for reservoir models and simulations (developed in Python, available for Windows and Linux). Other Python libraries support GRDECL format, for instance PyGRDECL10 or XTGeo,11 and can also be used for visualisation or other processings. Finally, a reviewer successfully imported Lundisim in ParaView.12
Potential Dataset Use/Reuse
Inspired by the SPE10 reservoir simulation challenge (Christie and Blunt 2001), Lundisim with its different environments is primarily meant for evaluating the performance of lossy or lossless compression algorithms with respect to reservoir modelling and simulation. Openly-shared models are scarce in reservoir geoscience and engineering. Lundisim serves other purposes as well.
It can be used to test more geologically-oriented upscaling methods and their reliability regarding information loss, through quality indicators (Preux 2014). While initially developed for hydrocarbons, our approach may be used for sustainable challenges, for instance geothermy, hydrogen () or carbon dioxide () (Alumbaugh et al. 2023) storage projects. Note that the Society of Petroleum Engineers has released a call on the 11th SPE challenge for safe and efficient implementation of geological carbon storage.
Being complex volume meshes, Lundisim models can be used to benchmark scientific data compression algorithms (either on the 3D volume data or meshes). They are also adapted to investigate the impact of reduced data precision (Moreland et al. 2022) or resolution change on pure objective metrics (for instance in a context of mesh visualisation, storage or checkpoint restart), but also on faithfulness of any simulation.
As for precision, current practice favours the IEEE 754 floating-point format—in double, quadruple or even octuple precision (Gladman et al. 2024)—to ensure both accuracy and simplicity of data management. As a result, some data fields are represented, stored and transferred with an excessive number of bits (Walters and Wong 2023). Plus, it is being recognised that for a given simulation workflow, quantities from an homogeneous data field may possess widely different statistical distributions, in which distinct scales of magnitude are associated to different spread/precision/impact. For instance, a permeability value of zero or below 50 md means ‘no to meaningless’ water flows (rocks working as ‘seals’), while values greater by orders of magnitude (over 10,000 md) may yield ‘full permeation’. As a consequence, a fine precision for small permeabilities is meaningful, when higher permeability values would not affect results when changed by . As HPC sparks interest on so-called next-generation arithmetic (such as UNUM or POSIT formats (de Dinechin et al. 2019; Kneusel 2023; Lindstrom 2022)), with so few simulation tools already adapted to such hybrid data formats, it is important to be able to emulate them on shared and representative datasets with minimal examples of workflows.
As for resolution, with edge computing, or the necessity sometimes to assess crude estimations in real time (Sicat et al. 2023) on low-power devices using cloud resources, it becomes increasingly important to provide users data with adapted granularity. One straightforward scheme consists in sharing the original data source as well as several lower-resolution versions, either with pyramid schemes (Ceballos et al. 2021) or with embedded multiresolution mechanisms, for instance with wavelets (Christophe et al. 2008; Jacques et al. 2011) as in (Peyrot et al. 2019). For this reason, we provide Lundisim models with their associated lower resolution representations. The latter may also be probed with varying precision, as mentioned previously. Evaluating the combined impact of resolution and precision is briefly evoked in (Bouard et al. 2021), supported by the video available at , and is the topic for the forthcoming companion paper (Duval et al. 2025b). Additional reuse cases reside in combining simulation and compression with machine learning or artificial intelligence tools, which are being used more intensively in simulation (Glaws et al. 2020).
Future research may be interested in larger-size models than those we share here. By providing here the main ingredients and philosophy used to build Lundisim models, we hope they will help in creating novel meshes along our open methodological guidelines.
Conclusion
A couple of years ago, due to the lack of openly shared heterogeneous and realistic geoscience data to study the influence of compression on simulation workflows, we designed our own models, inspired by the SPE10 challenge. For other researchers in this field to overcome this pitfall, we now share our models named Lundisim with the scientific community in the FAIR spirit. Based on a typical geoscientific mesh containing several faults and two formations proposed in SPE10, we generated four models with distinct environments, including porosity and permeability information. Thanks to the multiresolution HexaShrink framework, our dataset also includes lower-resolution versions of each model (mesh and attributes), with consistent fault preservation whatever the level of decomposition. We hope that this dataset will be useful to other geoscience researchers in taking their projects forward.
Acknowledgements
The research presented was mainly performed during the PhD thesis of Lauriane Bouard, following the post-doctoral position of Jean-Luc Peyrot and the internship of Lenaïc Chizat. The authors are grateful to IFP Energies nouvelles (INOVEME) for the permission to share Lundisim models. They acknowledge the support of AIR (Action IUT Recherche) of IUT Côte d'Azur. They thank Emmanuel Christophe (Waymo, formerly CNES), Laurent Astart, Nadine Couëdel, Thomas Guignon (IFP Energies nouvelles) and Corinne Maihles (TéSA) for their support, and the anonymous reviewers for their insightful suggestions.
Conflicts of Interest
The authors declare no conflicts of interest.
Data Availability Statement
The data that support the findings of this study are openly available in Zenodo at .
Abraham, F., and W. Celes. 2019. “Multiresolution Visualization of Massive Black Oil Reservoir Models.” Visual Computer 35: 837–848. https://doi.org/10.1007/s00371‐019‐01674‐x.
Ahrens, J. 2022. “Technology Trends and Challenges for Large‐Scale Scientific Visualization.” IEEE Computer Graphics and Applications 42, no. 4: 114–119. https://doi.org/10.1109/mcg.2022.3176325.
Alumbaugh, D., E. Gasperikova, D. Crandall, et al. 2023. “The Kimberlina Synthetic Multiphysics Dataset for CO2 Monitoring Investigations.” Geoscience Data Journal 11: 216–234. https://doi.org/10.1002/gdj3.191.
Baker, A. H., D. M. Hammerling, S. A. Mickelson, et al. 2016. “Evaluating Lossy Data Compression on Climate Simulation Data Within a Large Ensemble.” Geoscientific Model Development 9, no. 12: 4381–4403. https://doi.org/10.5194/gmd‐9‐4381‐2016.
Bouard, L. 2021. “Refinable Resolution and Precision for Volume Mesh Compression and Simulation in Geosciences.” PhD Thesis. Université Côte d'Azur, France.
Bouard, L., L. Duval, F. Payan, C. Preux, and M. Antonini. 2021. “Étude comparative de l'impact d'un codage à précision variable sur des données de simulation en géosciences.” In Proc. Colloque COmpression et REprésentation des Signaux Audiovisuels (CORESA). https://hal.science/hal‐03414943.
Bouard, L., L. Duval, C. Preux, F. Payan, and M. Antonini. 2019. “Refinable‐Precision in Mesh Compression for Upscaling and Upgridding in Reservoir Simulation With HexaShrink.” In Proc. RING Meeting. https://hal.science/hal‐03138861.
Ceballos, L., B. Conche, G. Dupuy, and D. Patel. 2021. “Visualization of Large Scale Reservoir Models.” In Interactive Data Processing and 3D Visualization of the Solid Earth, edited by D. Patel, 209–232. Springer.
Childs, H., S. D. Ahern, J. Ahrens, et al. 2020. “A Terminology for in Situ Visualization and Analysis Systems.” International Journal of High Performance Computing Applications 34, no. 6: 676–691. https://doi.org/10.1177/1094342020935991.
Christie, M. A., and M. J. Blunt. 2001. “Tenth SPE Comparative Solution Project: A Comparison of Upscaling Techniques.” SPE Reservoir Evaluation & Engineering 4: 308–317. https://doi.org/10.2118/66599‐ms.
Christophe, E., C. Mailhes, and P. Duhamel. 2008. “Hyperspectral Image Compression: Adapting SPIHT and EZW to Anisotropic 3‐D Wavelet Coding.” IEEE Transactions on Image Processing 17, no. 12: 2334–2346. https://doi.org/10.1109/tip.2008.2005824.
de Dinechin, F., L. Forget, J. M. Muller, and Y. Uguen. 2019. “Posits: The Good, the Bad and the Ugly.” In Proc. Conf. Next Generation Arithmetic (CoNGA). ACM.
Devarajan, H., A. Kougkas, L. Logan, and X. H. Sun. 2020. “HCompress: Hierarchical Data Compression for Multi‐Tiered Storage Environments.” In IEEE International Parallel and Distributed Processing Symposium. IEEE.
Duval, L., F. Payan, C. Preux, and L. Bouard. 2025a. LUNDIsim: Open‐Data Multiresolution Model Meshes With Porosity and Permeability Properties Inspired From SPE10 Challenge for Geoscience and Data Science Applications (1.0.0) [Data Set]. Zenodo. https://doi.org/10.5281/zenodo.14641958.
Duval, L., F. Payan, C. Preux, and L. Bouard. 2025b. “How Do Reduced Resolution/Precision and Companding on Permeability Data Affect Flow Simulation? Compression Benchmarks on LUNDIsim Mesh Models.” arXiv Preprint arXiv:2508.13636.
Gladman, B., V. Innocente, J. Mather, and P. Zimmermann. 2024. “Accuracy of Mathematical Functions in Single, Double, Double Extended, and Quadruple Precision.” Tech. Rep., LORIA.
Glaws, A., R. King, and M. Sprague. 2020. “Deep Learning for In Situ Data Compression of Large Turbulent Flow Simulations.” Physical Review Fluids 5, no. 11: 114602. https://doi.org/10.1103/physrevfluids.5.114602.
Haehnel, P., H. Freund, J. Greskowiak, and G. Massmann. 2023. “Development of a Three‐Dimensional Hydrogeological Model for the Island of Norderney (Germany) Using GemPy.” Geoscience Data Journal 11: 267–283. https://doi.org/10.1002/gdj3.208.
Hey, T., S. Tansley, and K. Tolle, eds. 2009. The Fourth Paradigm: Data‐Intensive Scientific Discovery. Microsoft Research.
Jacques, L., L. Duval, C. Chaux, and G. Peyré. 2011. “A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity.” Signal Processing 91, no. 12: 2699–2730. https://doi.org/10.1016/j.sigpro.2011.04.025.
King, M. J. 2007. “Recent Advances in Upgridding.” Oil & Gas Science and Technology‐Revue de L'institut Français du Pétrole 62, no. 2: 195–205. https://doi.org/10.2516/ogst:2007017.
Klöwer, M., M. Razinger, J. J. Dominguez, P. D. Düben, and T. N. Palmer. 2021. “Compressing Atmospheric Data Into Its Real Information Content.” Nature Computational Science 1, no. 11: 713–724. https://doi.org/10.1038/s43588‐021‐00156‐2.
Kneusel, R. T. 2023. “Posits.” In Numbers and Computers, 277–302. Springer.
Liang, X., B. Whitney, J. Chen, et al. 2022. “MGARD+: Optimizing Multilevel Methods for Error‐Bounded Scientific Data Reduction.” IEEE Transactions on Computers 71, no. 7: 1522–1536. https://doi.org/10.1109/tc.2021.3092201.
Lie, K. A. 2019. An Introduction to Reservoir Simulation Using MATLAB/GNU Octave. Cambridge University Press.
Lindstrom, P. 2022. “MultiPosits: Universal Coding of ℝn.” In Proc. Conf. Next Generation Arithmetic (CoNGA), 66–83. Springer.
Liu, J., J. Tian, S. Wu, et al. 2024. cuSZ‐i: High‐Fidelity Error‐Bounded Lossy Compression for Scientific Data on GPUs. https://doi.org/10.48550/ARXIV.2312.05492.
Magri, V. A. P., and P. Lindstrom. 2024. “A General Framework for Progressive Data Compression and Retrieval.” IEEE Transactions on Visualization and Computer Graphics 30: 1358–1368. https://doi.org/10.1109/tvcg.2023.3327186.
Misaghian, N., M. Assareh, and M. Sadeghi. 2018. “An Upscaling Approach Using Adaptive Multi‐Resolution Upgridding and Automated Relative Permeability Adjustment.” Computational Geosciences 22: 261–282. https://doi.org/10.1007/s10596‐017‐9688‐2.
Mittal, S. 2016. “A Survey of Techniques for Approximate Computing.” ACM Computing Surveys 48, no. 4: 1–33. https://doi.org/10.1145/2893356.
Moreland, K., D. Pugmire, and J. Chen. 2022. “The Exploitation of Data Reduction for Visualization.” Tech. Rep. ORNL/LTR‐2022/412, Oak Ridge National Laboratory.
Murillo, R., A. A. Del Barrio, and G. Botella. 2002. “The Effects of Numerical Precision in Scientific Applications.” Proceedings of the Annual Modeling and Simulation Conference.
Overpeck, J. T., G. A. Meehl, S. Bony, and D. R. Easterling. 2011. “Climate Data Challenges in the 21st Century.” Science 331: 700–702. https://doi.org/10.1126/science.1197869.
Peters, K., H. Höck, and H. Thiemann. 2020. “FAIR Long Term Preservation of Climate and Earth System Science Data With a Focus on Reusability at the World Data Center for Climate (WDCC).” ESS Open Archive: 10501879. https://doi.org/10.1002/essoar.10501879.1.
Peyrot, J. L., L. Duval, F. Payan, et al. 2019. “HexaShrink, an Exact Scalable Framework for Hexahedral Meshes With Attributes and Discontinuities: Multiresolution Rendering and Storage of Geoscience Models.” Computational Geosciences 23: 723–743. https://doi.org/10.1007/s10596‐019‐9816‐2.
Poppick, A., J. Nardi, N. Feldman, A. H. Baker, A. Pinard, and D. M. Hammerling. 2020. “A Statistical Analysis of Lossily Compressed Climate Model Data.” Computational Geosciences 145: 104599. https://doi.org/10.1016/j.cageo.2020.104599.
Preux, C. 2014. “About the Use of Quality Indicators to Reduce Information Loss When Performing Upscaling.” Oil & Gas Science and Technology—Revue D'IFP Energies Nouvelles 71, no. 1: 7. https://doi.org/10.2516/ogst/2014023.
Salomon, D., and G. Motta. 2009. Handbook of Data Compression. Springer.
Sarton, J., S. Zellmann, S. Demirci, et al. 2023. “State‐Of‐The‐Art in Large‐Scale Volume Visualization Beyond Structured Data.” Computer Graphics Forum 42, no. 3: 491–515. https://doi.org/10.1111/cgf.14857.
Schweiger, G., H. Nilsson, J. Schoeggl, W. Birk, and A. Posch. 2020. “Modeling and Simulation of Large‐Scale Systems: A Systematic Comparison of Modeling Paradigms.” Applied Mathematics and Computation 365: 124713. https://doi.org/10.1016/j.amc.2019.124713.
Sicat, R., M. Ibrahim, A. Ageeli, F. Mannuss, P. Rautek, and M. Hadwiger. 2023. “Real‐Time Visualization of Large‐Scale Geological Models With Nonlinear Feature‐Preserving Levels of Detail.” IEEE Transactions on Visualization and Computer Graphics 29, no. 2: 1491–1505. https://doi.org/10.1109/tvcg.2021.3120372.
Suter, E., T. Kårstad, A. Escalona, H. A. Friis, and E. H. Vefring. 2019. “Principles for a Hierarchical Earth Model Representation Aiming Towards Fit‐For‐Purpose Grid Resolution.” In Proc. EAGE Conf. Tech. Exhib. EAGE.
Taurone, F., D. E. Lucani, M. Fehér, and Q. Zhang. 2023. “Lossless Preprocessing of Floating Point Data to Enhance Compression.” In Proc. Int Conf. Distributed Computing and Artificial Intelligence, Lecture Notes in Networks and Systems, 457–466. Springer.
Underwood, R., J. Bessac, D. Krasowska, J. C. Calhoun, S. Di, and F. Cappello. 2023. “Black‐Box Statistical Prediction of Lossy Compression Ratios for Scientific Data.” International Journal of High Performance Computer Applications 37, no. 3–4: 412–433. https://doi.org/10.1177/10943420231179417.
Underwood, R., J. C. Calhoun, S. Di, and F. Cappello. 2024. Understanding the Effectiveness of Lossy Compression in Machine Learning Training Sets. https://doi.org/10.48550/ARXIV.2403.15953.
Walters, M. S., and D. C. Wong. 2023. “The Impact of Altering Emission Data Precision on Compression Efficiency and Accuracy of Simulations of the Community Multiscale Air Quality Model.” Geoscientific Model Development 16, no. 4: 1179–1190. https://doi.org/10.5194/gmd‐16‐1179‐2023.
Wang, D., J. Pulido, P. Grosset, et al. 2024. “TAC+: Optimizing Error‐Bounded Lossy Compression for 3D AMR Simulations.” IEEE Transactions on Parallel and Distributed Systems 35, no. 3: 421–438. https://doi.org/10.1109/tpds.2023.3339474.
Wellmann, F., and G. Caumon. 2018. “3‐D Structural Geological Models: Concepts, Methods, and Uncertainties.” In Advances in Geophysics, vol. 59, 1–121. Elsevier.
Yakushin, I., K. Mehta, J. Chen, et al. 2020. “Feature‐Preserving Lossy Compression for In Situ Data Analysis.” In Proc. Int. Conf. Parallel Processing. ACM.
© 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.