Content area

Abstract

Both remote sensing and medical fields benefited a lot from the machine learning methods, originally developed for computer vision and multimedia. We investigate the applicability of the same data mining-based machine learning (ML) techniques for exploring the structure of both Earth observation (EO) and medical image data. Support Vector Machine (SVM) is an explainable active learning tool to discover the semantic relations between the EO image content classes, extending this technique further to medical images of various types. The EO image dataset was acquired by multispectral and radar sensors (WorldView-2, Sentinel-2, TerraSAR-X, Sentinel-1, RADARSAT-2, and Gaofen-3) from four different urban areas. In addition, medical images were acquired by camera, microscope, and computed tomography (CT). The methodology has been tested by several experts, and the semantic classification results were checked by either comparing them with reference data or through the feedback given by these experts in the field. The accuracy of the results amounts to 95% for the satellite images and 85% for the medical images. This study opens the pathway to correlate the information extracted from the EO images (e.g., quality-of-life-related environmental data) with that extracted from medical images (e.g., medical imaging disease phenotypes) to obtain geographically refined results in epidemiology.

Full text

Turn on search term navigation

1. Introduction

In recent years, there has been a rapid increase in data available, where many data sources have now reached the status of Big Data. For several years, we have been talking about Big Data. Big Data can be described in five different ways (called the 5 V’s): “by an enormous data volume, exhibiting a wide variety of types, requiring a high velocity of data processing, a need to deal with the veracity of data uncertainty, and to transform the data into value” [1].

In Earth observation (EO), the Copernicus Open Access Hub provides complete, free, and open access to Sentinel-1, Sentinel-2, Sentinel-3, and Sentinel-5P data products. With the launch of the Copernicus Sentinels and several new national missions [2], the volume of EO data has dramatically increased, reaching up to hundreds of petabytes of data [3]: the extraction of the valuable information from these data has become a significant challenge. In December 2021, 45 million Sentinel products were acquired since the start of operations: 410 PB volume of downloads has been generated by more than 500,000 users. Recent reports [4] show that the daily average of data generated per day by the Copernicus programme is 20.2 TB divided as follows: Sentinel-1 data 31%, Sentinel-2 data 47%, Sentinel-3 data 20%, and Sentinel-5P data 2%. The open and free access to such data opens new opportunities and poses challenges to the storage, processing, and analysis of such massive data. A state-of-the-art survey of several land cover datasets for multispectral and synthetic aperture radar (SAR) sensors is provided in [5].

In the medical field, there is a large number of medical datasets available [6] linked to different diseases [7]. The only restriction is that sometimes the medical data is governed by the protection of patient data. For example, regarding the neoplastic disease that is our case of investigation in this paper, the Cancer Imaging Archive [8] is one of the most extensive and publicly available medical archives funded by the Cancer Imaging Programme, which is part of the United States National Cancer Institute and managed by the Frederick National Laboratory for Cancer Research. The dataset contains images acquired from patients with cancer of different organs (e.g., lung, prostate, liver, breast, and colon); these images were acquired by sensors/devices (e.g., magnetic resonance imaging, computed tomography, and digital histopathology). Another freely available dataset is the one provided by WebPathology [9], which contains high-quality pathology images of benign and malignant neoplasms and related entities.

In this paper, we explain how information extracted from images (whether Earth observation or medical) using a data mining system can help users better understand the content after semantic labelling. Therefore, the first investigation is made with EO images, for which there is already a validation, and then continues with the investigation on the medical images.

Here, we present a machine learning system developed for EO images (radar and then extended for multispectral). For demonstration, several cities were selected, and depending on the availability of various sensors, we acquired various images (either radar or multispectral). Finally, after the system has been operated and validated by an expert in EO, we extend it to medical images.

This article touches on many Big Data notions, such as the large volume of data in both fields. Furthermore, these data need to be interpreted and understood using machine learning techniques (e.g., data mining) and semantically labelled. These notions are further used to show a connection between these semantic classes extracted by different sensors and the two domains; we use knowledge graphs. Finally, we involve expert users in both fields to understand and explain these results. In the future, we would like to develop methods for the explainability and trustworthiness of developed machine learning techniques.

Previous studies linking remote sensing and medicine have primarily focused on correlating environmental factors with disease incidence or using remote sensing-derived variables as external inputs to epidemiological models. In contrast, our work directly transfers semantic labelling and active learning workflows originally developed for Earth observation image analysis to medical imaging. This enables a unified framework, where environmental and clinical data can be analysed within the same methodological pipeline, facilitating integrative studies at the intersection of public health and precision medicine.

In the Big Data era, we face enormous data delivered daily by current instruments and stored in archives. As a result, we need processing systems to extract knowledge from such extensive archives. This paper is organised as follows: In Section 2, we present the implementation details of the proposed workflow method, and we describe the characteristics of each dataset compiled for each set of images. In Section 3, we show the classification results obtained by applying the proposed method and discusses specific details for each individual set of images acquired by the different sensors. Further, we detail the information that can be extracted using such a method from the EO and medical images. In Section 4, we analyse the findings of the current study starting in light of existing related studies on EO and medical images. This section ends with perspectives and future work, followed by the concluding remarks in Section 5.

2. Materials and Methods

2.1. Image Processing Workflow

The system’s workflow for generating semantic classification results is based on previous work described in [10], in which a satellite image time series was analysed in terms of temporal changes. However, the methodology now includes efficient exploitation of EO multi-sensor data and concentrates mainly on urban areas [5,11,12]. The novelty of the present work is the analysis of medical images using the same methodology.

The overall processing workflow is depicted in Figure 1. The methodology comprises six steps and is used for handling and analysing EO images and medical images to create semantic maps, domain ontologies, and knowledge-graph representations. In addition, this workflow can be used to create the city models based on selected EO images or patient models under medical follow-up based on collected medical images.

The key elements in image processing are, hence, the application of Gabor filters with 5 scales and 6 orientations to EO image patches, following the configuration established by [13,14,15]. For medical images, Weber Local Descriptors with 8 orientations and 18 excitation levels were used [15,16]. These parameters were selected to balance texture sensitivity and computational feasibility, enabling direct methodological transfer across domains. Classification was performed using a Support Vector Machine (SVM) with a chi-squared kernel, one-against-all strategy, and 5-fold cross-validation. Active learning cycles involved iterative expert feedback, with domain experts labelling positive and negative examples until semantic convergence was achieved. The model training using SVM with active learning and relevance feedback, following the workflow described in our previous EO publications, is summarised in Table 1.

These steps are as follows:

Step 1: During a preparatory phase, the experts select the dates and target areas acquired by different satellites and download them to typical archives such as the one from the ESA Copernicus hub [17] or the DLR archives [18]. Each EO acquisition from the archive has two parts: the image and the metadata. Later, these two parts are used during operations, as shown in Figure 1. Like EO images, the medical images are acquired by various devices/sensors operated by experts and stored on servers as images and metadata associated with each medical image.

Step 2: Tile each EO image into patches (with no overlapping) and with a pre-selected size, depending on the actual pixel ground sampling distance to cover the objects on the ground (see Table 2). The tiling is applied in the same way for the medical images, with no overlapping between the patches and with a patch size adapted to the content of the medical image (see Table 3). The size of the patch should be adapted to the image resolution and its content, so that the patch includes (as much as possible) a single object [11,12].

Step 3: Extract the primitive features that describe the content of each original patch. For the EO images, one applies Gabor filters with 5 scales and 6 orientations [13,14,15], while for the medical images, one applies Weber local descriptors [15] with 8 orientations and 18 excitation levels or multispectral histograms 64 bins. In the case of Gabor filtering, we extract the Gabor coefficients from each image patch and compute the means and standard deviations of each set of coefficients (in total, 5 × 6 × 2 = 60 coefficients). In the case of Weber local descriptors, the features are extracted from each patch in a set of 144 (i.e., 8 × 18) coefficients. A detailed study in the use of various primitive feature extraction methods and different values of their parameters can be found in [5,19].

Step 4: The classification of the primitive features of each original patch is made automatically, and the patch features are grouped into clusters (i.e., “mathematical groupings”) using a Support Vector Machine (SVM) [16] with relevant feedback. The aim is to obtain a feature-based image patch classification by assigning a single semantic label to each patch using a user-oriented terminology of real-world classes. For the SVM, a chi-squared kernel is selected, and a one-against-all approach is used. The activities of the expert users are called “active learning”, referring to the interactive selection of randomly selected positive and negative examples of target classes based on a proper visualisation of the individual patches, a visual comparison of the selected patches (using for comparison the Google Earth maps in the case of EO images and reference health datasets in the case of medical case), and human expert judgements about the actual patch content.

Step 5: Generate a set of patches that are semantically correctly labelled. This step is finished once all the given patches have been labelled. However, some patches may remain unlabelled. If the missing labelling represents problem cases, an expert user must identify the most probable class, or the data can be assigned to an unclassified class. There is already a hierarchical semantic labelling scheme in the case of EO images (see [16]). However, regarding the medical images, the experience of expert physicians is significant for defining the semantic labels allocated to the classes.

Step 6: Interpretation of the produced results. The first data product is the semantic classification results/maps of each image—or, in the case of image time series—the corresponding change maps. The second product is domain ontology representations [5], which help users extract the information and knowledge from the images. Finally, knowledge graphs can be created to explain the entire chain of relations (starting from the data, information extraction up to the semantic classes and their relations).

To summarise the process from Figure 1, we can say the following: the proposed methodology has two branches: one for Earth observation images and one for non-EO images. In detail, our approach consists of the following steps: acquire new data from one of the available EO satellites and store the EO image products in an archive, tile each image into patches with different sizes and integrate the appropriate metadata parameters, extract the primitive features from each patch (in this case, we used Gabor filters with 5 scales and 6 orientations), group them into classes using an active learning approach by applying a Support Vector Machine with relevance feedback, and provide a semantic label for each patch. Here, a list of already defined semantic labels exists, and the user can select an existing one or define a new one. Once EO experts have completed this process, users can create semantic classification maps of the given images. Like EO imagery, the procedure for medical images starts by acquiring images from various instruments such as optical microscopy images in classical haematoxylin-eosin (HE) and computed tomography (CT) images and storing them on a server. The following steps are similar; however, the size of the patches depends on the medical image parameters. Finally, the Weber local descriptors generate the extracted primitive features. Here, compared to the EO domain, the knowledge of field experts is very important in defining the semantic classes because there are no already predefined labels. After the images have been classified, the next step is to create semantic maps of the medical images.

2.2. Dataset Description

This subsection is dedicated to selecting and describing the images used in this paper. The images are divided into two main categories, namely EO images and medical images. In the case of EO images, we considered the availability of the selected images, the diversity of locations, and the available sensors (radar or multispectral instruments). In addition, we considered the following types of images collected from haematoxylin-eosin (HE) pathological slides and computed tomography (CT) scans in terms of medical images.

The data provided by each EO sensor are the images (in GeoTIFF or JPEG2000 format) and their metadata (in XML format). These metadata contain information about the acquisition parameters of the respective image (e.g., time of acquisition, incidence angle, resolution).

Similar to EO images, medical images are also accompanied by additional information such as clinical and demographic data (e.g., sex, weight, height, body mass index, area of residence) and relevant information from medical history, including current chronic diseases, treatments, and risk factors (usually in TXT or CSV format, separate from the map data).

2.2.1. Earth Observation Datasets

From the large amount of EO data provided by different instruments [2,20], we selected six instruments that flew over the following cities: Berlin, Bucharest, Vancouver, and some cities areas between Albania and Greece, such as Corfu. We grouped the instruments into multispectral and SAR imagers. The data were received/provided via proposals, project agreements, or downloaded from the sensor imagery samples, and are sometimes subject to copyright rules, which we adhered to. From the wide variety of available and existing instruments, we selected the following ones (in alphabetical order): Gaofen-3 [21], RADARSAT-2 [22], Sentinel-1 [23], Sentinel-2 [24], TerraSAR-X [25], and WorldView-2 [26]. Table 2 shows the most important parameters of each selected instrument. More examples can be found in [27,28]. The dimensions of the EO images range from 5000 × 5000 pixels for TerraSAR-X, to 10,000 × 10,000 pixels for Sentinel-2, and even up to 25,000 × 16,000 pixels for Sentinel-1.

2.2.2. Medical Datasets

Medical images from various clinical and biomedical research fields are widely available online. For instance, an extensive collection of radiological medical images created for the validation of AI methods can be found in [29]. Unfortunately, these images are subjected to sensitive personal data protection issues, so many clinical and demographic data and clinical history were unavailable [29]. The first two sets of images were acquired by optical microscopy, while the third set of images was acquired by computer tomography (CT) scan. The number of selected images for demonstration depends on the diversity of medical cases identified in each set of images. Table 3 shows the essential parameters of each selected dataset. All medical images used in this study were fully anonymized prior to analysis. Public datasets (WebPathology and TCIA) were used under their respective open-access policies. A small number of additional de-identified pathology images were collected retrospectively under institutional IRB approval no. 1575/02.02.2024. No patient identifiers were accessible to the research team, and no additional informed consent was required.

(1). The first dataset of images is from 8 patients with colorectal adenocarcinoma; the total collected images at different magnifications (5×, 10×, 20×) are 180. They were collected using an optical microscopic device equipped with an RGB camera, being selected by a medical expert based on their expertise to include typical normal structures of the colon wall and pathological structures commonly found in colon adenocarcinoma. Usually, the prototypic colorectal cancer is a well-to-moderately differentiated adenocarcinoma consisting of tubular, anastomosing, and branching glands in a desmoplastic stroma. The surface component may be ulcerated or show papillary or villous architecture. In addition, residual adenoma is often present at the edge of the tumour [30,31].

(2). The second dataset of images is from the patients with lung tumours; there are 11,210 CT images and 25 pathology slices collected from 6 patients. From these, we selected 10 images from 2 patients with lung adenocarcinoma. Usually, lung adenocarcinomas show an admixture of many architectural patterns such as acinar, papillary, micropapillary, lepidic, and solid growth patterns [32,33].

(3). The third dataset of images is extracted from a collection of 52,072 images from 422 patients with non-small cell lung cancer (NSCLC) [34]. For these patients, pre-treatment CT scans lung tumours; manual delineation by a radiation oncologist of the 3D volume of the gross tumour volume and clinical outcome data are available in [31] for the Lung1 dataset. Typically, lung cancer pathology can identify two groups of cancer cells: small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC). Then, the last ones, the NSCLC, are divided again into squamous cell cancer (SCC), large cell cancer, and lung adenocarcinoma. Finally, in situ (ISA) and invasive are the two types of lung adenocarcinoma.

3. Results

This paper is based on an “intelligent system”, which starts from the data and reaches its knowledge [35]. The system has two stages: (1) the first stage is to transform the data (in our case, the images) into actual information. The data is in the form of images, where after several processing steps (see Figure 1), each patch being tiled from the image comprises an associated feature vector and its corresponding metadata (e.g., geographical position, acquisition data, and instrument type). The image is now converted into information. Then, in the second stage, the contoured image is transformed into knowledge about that image. Such transformation is triggered by the user who, with the help of his knowledge, classifies and groups the features of each patch into classes (categories) and then gives them meaning by defining semantics. One obtains, thus, a semantic catalogue for each domain [36]. This type of system is mono-directional, in which the information flows from the machine to the human. When all the results are stored in a database, each patch is associated with metadata, features, semantic labels, and user ID. All this information/knowledge can be used to make various queries in the database (in the opposite direction), or different knowledge representation models can be defined.

3.1. Semantic Classification Based on the Extracted Information

3.1.1. Earth Observation Images

This sub-section explains the classification results obtained for each target area, considering that various instruments have acquired the data. The results show that the number of obtained classes depends on the instrument type, resolution, and patch size. For EO data, we used for comparison the same type of features, namely, the results of Gabor filtering, where we extracted 60 coefficients (see Step 3 in Section 2). A detailed analysis of the impact of a selected patch and its resolution on the number of attainable classes has already been published in [34,35].

For a set of characteristic EO images, we selected from the available data four cities/areas, for which we could obtain simultaneous acquisitions of several instruments (SAR and multispectral). The investigation could be grouped into three categories: (1) instruments flew over the same city or a subarea of the city (e.g., Bucharest, Romania); (2) the acquired images mainly cover the same area of the city (e.g., Berlin, Germany) or different subareas of the city (e.g., Vancouver, Canada); (3) the acquired images cover the city together with its surrounding areas, up to the coverage of a neighbouring island (e.g., Corfu, Greece).

Bucharest [37] is analysed through the acquisitions made by TerraSAR-X and WorldView-2. After applying the method described in the Materials and Methods section, the semantic classification results show that the number of identified classes in the two images is the same (see Figure 2). However, in the case of multispectral data, the classification accuracy is higher [38]. The higher accuracy is due to (a) the better resolution of the WorldView-2 instrument (1.87 m instead of 2.9 m) and (b) the smaller patch size (100 × 100 pixels instead of 160 × 160 pixels). The smaller patch size shrinks the size of the classified object, thus avoiding a mix-up with other objects.

For Berlin (see Figure 3), the overlapping of the two images is partial, but parts of the city centre (e.g., Tiergarten Park and the Brandenburg Gate) are common. For comparison, we analysed two images acquired by similar types of SAR instruments. When we compare the classification results obtained by GF-3 and TerraSAR-X, we notice that even if GF-3 has a better resolution, the number of attainable surface cover classes is higher for TerraSAR-X due to the actual content of the image. The TerraSAR-X image also covers the northern part of the city, richer in content, and new categories appear, such as bridges, channels, etc.

For Vancouver (see Figure 4), we used images of two SAR instruments (TerraSAR-X and RADARSAT-2) covering separate parts of the city with different resolutions. The images show that there are situations in which the instruments cannot cover the same area of a city but only parts of it. However, this approach is helpful for repeated coverage of a city and its classification.

The images in Figure 5 were acquired by three different instruments (two SAR instruments and one multispectral instrument). These images were selected to obtain a reduced time interval between their acquisitions because a simultaneous acquisition was not possible due to the satellite revisiting times and—in the case of the multispectral instrument—due to cloud coverage of the investigated area.

When we analyse the results of the semantic classifications, we can observe the following: (a) comparing Sentinel-2 with Sentinel-1 (with resolutions of 10 m versus 20 m) reveals that the vegetation classes can be distinguished much better due to the high resolution of the images and the additional multispectral bands (Sentinel-2); (b) comparing TerraSAR-X with Sentinel-1 (resolutions of 3 m versus 20 m) shows a more refined set of semantic classes found by the former instrument. Here, the patch contains more than 80% of the most prominent semantic class, compared to the Sentinel-1 case, where we have a more mixed content in the patch.

Selecting a specific class of interest, such as aquaculture (i.e., fish cages in coastal seas), shows that identifying this category depends on the instrument’s resolution. This aquaculture class can be seen from three semantically classified images acquired with a resolution of up to 10 m (e.g., TerraSAR-X, Sentinel-2).

3.1.2. Medical Images

After the method has been successfully validated for EO images, now in this sub-section, we validate the method/system for several medical images acquired from humans with various sensors (as single-band or multi-band). Each image is explained based on the semantic classification results. We choose the most representative images per case for a demonstration from each set of medical images. Similar results are obtained for the other images from the datasets.

For medical data, we used for the comparison of each dataset the same type of features (see Step 3 in Materials and Methods). Based on the knowledge gained in the analysis of EO images, in the case of medical images, we tried different patch size (see Step 2 in Materials and Methods) for each set of medical images in order to better capture their contents and to describe the semantic classes as precisely as possible. Comparative results obtained using different patch sizes are presented in the following figures for each set of medical images. However, in the future, a more detailed study in this direction is necessary to identify the optimal patch size for each individual case.

To compare the results obtained with the method presented in Materials and Methods, we use an unsupervised method based on Latent Dirichlet Allocation [42], developed for topic representation of EO images during the H2020 ExtremeEarth project. Here, the method is applied to the medical images (with a patch size of 4 × 4 pixels) selected from the three datasets in order to compare the results with the one from Materials and Methods.

The first medical dataset analysed is that of the colorectal adenocarcinoma, acquired using optical microscopy. From the dataset, we chose the most representative images. For this set of images, the patch size is reduced by adapting the patch size to the resolution of the image and the content of the image. We compared the classification results using four different patch sizes, 48 × 48 pixels (with a total number of patches equal to 336), 24 × 24 pixels (with a total number of patches equal to 1344), and 16 × 16 pixels (with a total number of patches equal to 3072). The results of these different patch sizes (for the image with the ID KS-172) are presented in Figure 6 and Figure 7. For comparison, the previous results are compared with the LDA method with the patch size of 4 × 4 pixels (with a total number of patches equal to 49,152).

With a larger size of the patch, a larger area of the image is incorporated; therefore, more objects can be included in it (see Figure 6 and Figure 7), which is not desired for a better classification. By reducing the size of the patch (see Figure 8), we can obtain a better classification; therefore, the patch includes only one class.

In Figure 6 and Figure 7 (top part), class 5 around class 1 is the adventitia tunic around a blood vessel.

Regarding Figure 6, Figure 7 and Figure 8 (bottom part), when class 1 is inside the blood vessel, it is the lumen (an empty space); when it is in the submucosa, it represents the loose connective tissue. For class 2, when it is around the blood vessels, it represents their wall (tunica adventitia, muscular, intima), and when it is outside the crypto areas, it is the connective tissue.

Based on the results obtained in Figure 6, Figure 7 and Figure 8 and the expert observations, it can be seen that for a more detailed/finer classification, a smaller patch is required in order to retrieve the small areas/cells.

Further, the other selected images for evaluation (KS-031, KS-032, KS-040, KS-042, KS-145, KS-165, and KS-168) were classified. The results were similar with the one from Figure 7 (bottom part) and are not presented here due to space limitations.

Analysing the results from the above figures (Figure 6, Figure 7 and Figure 8), we can see that the proposed method works very well when the size of the patch is smaller. In the unsupervised case, the results are not as good (see Figure 8). What is very important in the case of medical images, due to the complexity of the image content, is that an expert (e.g., a doctor) and his knowledge in the field are needed in order to classify the image and group the patches into the classes, and further to provide the semantic meaning to the retrieved classes.

In the histopathology image of Figure 6 (top left), there are three structurally distinct areas: the structurally normal glandular zone, adipose tissue with blood vessels surrounded by connective tissue, where disturbances in organisation are identified (Figure 6, top right). In the case of the blood vessels, a distinction was made between the vascular walls and the interior of the vessel, with a clear differentiation within the latter between the area occupied by blood cells (predominantly erythrocytes) and the empty intraluminal space. There is an error zone, where classes related to a quasi-homogeneous pinkish area representing a fibrinogen accumulation zone are mixed. Small-sized areas are incorrectly identified as glandular tissue in perivascular regions and as vascular wall areas in the dominant glandular region. Due to their small size, it is difficult to achieve continuity of the vascular wall. Dividing the same image into 24 × 24 patches led to some errors (Figure 6, bottom right). The vascular wall was incorrectly identified as connective tissue, and the vascular lumen as adipose tissue. Additionally, there were a few patches where connective tissue was misidentified as glandular tissue. Reducing the patch size to 16 × 16 (Figure 7) allowed for the identification of vascular structures but still did not distinguish between the vascular wall and the blood-filled lumen. Moreover, the periglandular areas identified as vascular structures expanded. A positive aspect is the ability to differentiate between loose and dense connective tissue. However, distinguishing between adipose tissues and empty intraluminal spaces remains impossible. Unsupervised learning identifies a larger number of classes (Figure 8), but there are significant limitations regarding the distinction of areas with different histological characteristics. Confusions arise between areas of connective tissue and vascular wall areas, even inside the intraluminal blood clot, between the glandular lumen and areas of connective tissue, and so on.

Figure 9 displays images selected from the entire dataset from which we are taking the sub-images for our investigation. These areas are marked with a green rectangle and annotated from 1 to 19. The index of each sub-image is marked in the lower left corner of the green rectangle.

We compared the classification results using four different patch sizes, 48 × 48 pixels (with a total number of patches equal to 288), 32 × 32 pixels (with a total number of patches equal to 675), 16 × 16 pixels (with a total number of patches equal to 2750), and 8 × 8 pixels (with a total number of patches equal to 11,100). The results of this comparison is performed for a sub-image with the id 15 (see Figure 10 and Figure 11), and we can say that the best results, in terms of the number of retrieved classes and visual comparison (given by an specialist doctor) between the original image and classified one, are those obtained for the case when the patch was 8 × 8 pixels.

For the same sub-image id 15, the method in [42] is applied with different number of classes/topics (e.g., 6, 10, and 12) with a patch size of 4 × 4 pixels (with a total number of patches equal to 44,400). The results are presented in Figure 12.

Similar results are obtained for the other representative sub-images selected from Figure 9 and marked from 1 to 19. Figure 10 (top left) shows the image of a lung parenchyma with inflammatory infiltrate, in which a blood vessel and a few airways are being destroyed. The analysis of the image, starting from patches of 48 × 48 pixels, relatively correctly distinguishes these areas, with a few errors related to some areas of the lung parenchyma richer in connective tissue, which are mistakenly identified as vascular walls. It is noteworthy that a distinction can be made between the empty intraparenchymal areas that may correspond to airways (bronchi of various orders). Figure 10 (bottom, left) represents the result of the analysis of images with smaller dimensions (32 × 32 pixels).

It is notable that there is a clear distinction mentioned above between the empty intraparenchymal space and the empty space at the edge of the area occupied by the tissue sample on the slide. The only notable error remains the identification of some areas of connective parenchymal tissue as being vascular walls. Figure 11 (above, left) represents the result of the analysis using patches of size 16 × 16 pixels. The error related to the incorrect identification of areas of connective parenchymal tissue as vascular wall persists even in the wrongly identified areas, as can be seen in Figure 11 (top, left).

Figure 11 (bottom, left) represents the processing of the image using patches of 8 × 8 pixels. The details are more numerous, and the errors are smaller. Various types of classes could be identified, including areas of connective intraparenchymal tissue. The areas where there are errors related to the identification of intraparenchymal connective tissue as the vascular wall are small in size. The results of unsupervised learning become increasingly inefficient as the size of the patch used is smaller and the number of classes is larger, as shown in the images from Figure 12 (right).

The third medical dataset analysed is that of lung adenocarcinoma, which was acquired using CT. The patients, whose CT images are available in the third dataset and analysed here, are the same as the patients in the second dataset. From this dataset, we chose the most representative images, which contain different anomalies (suspicious or cancerous areas) and need to be analysed.

The following figures show images selected from the entire dataset and used in our investigation. We compared the classification results using four different patch sizes, 48 × 48 pixels (with a total number of patches equal to 140), 32 × 32 pixels (with a total number of patches equal to 315), 24 × 24 pixels (with a total number of patches equal to 609), and 16 × 16 pixels (with a total number of patches equal to 1333). The results of this comparison are given in Figure 13 and Figure 14, and we can say that, in the case of CT images, the best results are for the classification with a patch of 16 × 16 pixels, which not only has the highest number of classes retrieved by the method but also benefits from the expertise of the user (in this case, the specialist doctor).

Based on this result, Figure 12 shows the analysis of two other CT images: one collected from a patient with no diseases and one from a patient with a cancerous tumour. For the same set of CT images selected from the dataset, the unsupervised method [42] is applied (number of classes/topics is equal to 12) with a patch size of 4 × 4 pixels (with a total number of patches equal to 30,800). The results are presented in Figure 13. The training of the algorithm on computed tomography images allows the introduction of multiple classes representing anatomical structures and regions of the thorax with relatively few errors (Figure 13, top left). Figure 13 (top right) shows a CT image of the transverse section of the thorax at the level of T5, where the right lung is identified and a large-sized tumour is in the middle. For proper preparation, these tumours can be delimited without errors of misidentifying other structures or tumour tissue. Keeping the same number of classes (of anatomical structures) but reducing the size of the patches results in improved delineation of these areas (Figure 13 and Figure 14, bottom left). Reducing the patch size to 24 × 24 and 16 × 16 allows for refinement in the definitions of selecting anatomical regions in the thorax (Figure 14). Patches with a size of 16 pixels are optimal for delineating structures and anatomical regions in sections where the tumour is absent (Figure 15, top) or present (Figure 15, bottom). Also, it is noted that errors in identifying structures are greatly reduced, regardless of the sizes of the patches used in the analysis. Unsupervised learning using 4-pixel patches is not effective, neither in cases where the tumour is absent nor in those where it is present (Figure 16), even though the number of classes is the same as in supervised learning.

3.2. Knowledge Representation

Daily, through the Copernicus Programme, a huge amount of data (e.g., 12 TB of satellite images/day) is acquired, much of which remain unexplored in archives. This is a typical case where methods/algorithms are needed to explore these archives and extract useful information. One of the methods that can be applied for this purpose is the one based on active learning, with human interaction from a user who can guide the search and the semantic annotation in the desired direction by extracting information necessary for a given application [5].

In this section, we use the output of the proposed method (the semantic classes) in order to efficiently exploit the information/knowledge from the semantic labels and to analyse the relation between the different labels in a model of each observed target area. Based on the collected information, Figure 17, Figure 18 and Figure 19 show built models that include the semantic classes, which can lead to the creation of the model of the city that considers two major aspects: How green is the city? and How industrialised/populated is the city? Figure 17, Figure 18 and Figure 19 show three examples for the following cities: Bucharest (Figure 17), Berlin (Figure 18), and Vancouver (Figure 19).

Similar to the creation of the city model, a model of the patient or patients from the same city/region can be created. The next figure (Figure 20) shows such a model of two patients with cancer from the same city.

4. Discussion

4.1. Machine Learning Systems and Semantic Labelling

Annotation systems have been proposed for multimedia data for the first time. An example is the online LabelMe tool [43] for annotating pictures. Several research groups created systems for image understanding and annotation [44], including web engines in remote sensing. In the EO domain, most of the well-known systems are based on machine learning techniques (e.g., content-based information retrieval [45], data mining [46], deep neural networks [47], and a hybrid semi-supervised technique [48]. We would like to mention here: LandEX [49], GeoIRIS [45], EOLib [50], and CANDELA [27].

In recent years, with the advancement of artificial intelligence, especially machine learning techniques [51], these techniques have become more visible in the medical field [52], with the main direct goal of better diagnosis and diagnostic error elimination [53]. Based on our knowledge, in the medical field, most of the existing systems use segmentation techniques to separate different objects/classes in an image or use a content-based information retrieval approach to identify similar images. However, a system like the one we proposed for medicine based on an active learning approach (using a data mining technique) is not yet available.

The present study was designed as a proof of concept to explore the transferability of semantic labelling and active learning techniques from the field of Earth observation to medical imaging. Unlike deep learning-based approaches that dominate current medical image analysis, our system emphasises interpretability and integration with environmental datasets via knowledge graphs. While our results demonstrate feasibility, we do not claim superiority over mainstream methods. A comprehensive benchmarking study comparing our approach with state-of-the-art deep learning algorithms will be an important next step once larger and clinically curated datasets become available. Our approaches are systematic in terms of medical cases but are exhaustive in terms of testing the capabilities of the methodology. We found that this method is quite efficient at the histological level and remarkably efficient for delineating organs, structures, and anatomical regions based on CT images. This CT image analysis capability allows the methodology to be combined with radiomics, as we will show a bit further on.

There are already predefined semantic annotation schemes for labelling the EO data in remote sensing (including radar and multispectral images). One of them is presented in [5] and is further applied in this paper. From about 300 radar images, 354,000 patches were generated, to which 53 semantic labels have been assigned, using a data mining system and the knowledge of an expert user in the field. To this number of labels can be added the cases where two or more semantic classes are assigned to a patch in a multi-labelling approach (e.g., channels and high-density residential areas), plus the class Unclassified, which is assigned to the patches for which no suitable label was found.

The number of semantic classes identified in an EO image varies between 10 and 20 classes; this depends on the image’s location and content. From our experience [1,28,54], a more significant number of semantic classes are being identified in metropolitan areas.

In contrast, the number of semantic classes in the medical field is smaller but with much more complex image content. In this case, in the definition of semantic classes, the knowledge provided by an expert in oncological diagnostics is critical.

Graph-based representations of data, their interrelationships, and their linking with higher-level knowledge have already been proposed by several authors [55,56,57]. Typical applications are graph data for graph visualisation, graph matching, graph patterns and substructures, pattern and grammar learning, decision trees, and graph mining.

An essential step in using graphs for image content interpretation was the introduction of knowledge graphs by Google [58]. We can assume that the big commercial search engines (such as Bing and Wikipedia) rely on knowledge graphs in their search routines when they combine higher-level knowledge expressed as ontologies with digital information such as images.

4.2. Knowledge Graphs and Interpretability

The scientific goal of knowledge graphs [59] is to select image data combined with additional information and generate higher-level interpretation results from them. There are already several available knowledge graphs, among which we would like to mention Google Knowledge Graph, Wikidata, DBpedia, and YAGO [60]. A survey on knowledge graphs has been published by [61], while a detailed scientific paper dealing with knowledge graphs for remote sensing images has been published by [62]. A study paper for health knowledge graphs built from medical records has been presented in [63].

Although machine learning methods have reached a high level of maturity in many fields (e.g., computer vision, remote sensing), there is still a need for the explainability and trustworthiness of developed methods. Recently, in this domain, there have been some efforts to create explainable machine learning (XML) models, which are part of the larger domain of explainable artificial intelligence [64,65,66]. The three concepts that govern XML are interpretability, transparency, and explainability.

A first step in expanding the knowledge from EO to medicine was taken within a previous research TELEIOS project [67], where a data mining “laboratory” system was tested successfully with other types of images rather than the EO images for which it was developed. The image was a microscopic tissue of the stomach collected from a herbivorous animal. The RGB image had the size of 3136 × 2352 pixels that was tiled into 1131 patches with the size of 80 × 80 pixels. Based on this system, six semantic classes were extracted.

4.3. From AI for EO to AI for Health: The Case of Lung and Colorectal Cancers

Such an approach that is proposed in this paper for the medical field is an unexplored opportunity as it emerged when we compared the number of publications in remote sensing with those in medicine. This study was conducted on the IEEE Xplore website, an extensive database of articles with a remarkable impact, by querying based on keywords related to the two fields. The search was performed for keywords such as “machine learning” and “remote sensing” or “medical images” (see Figure 21a). Another comparison was made between “machine learning” and “Earth observation” or “Computed tomography” (see Figure 21b). Finally, from the available machine learning techniques, our proposed and selected method is the “active learning” method that was compared in the number of existing publications on “Earth observation” and “Computed tomography” (see Figure 21c). The main observation of this study is that the research in the field of Earth observation is much more developed than in the medical field (in several available publications). This conclusion is an additional reason that motivates us to find solutions applicable to the medical field.

A state-of-the art in the field of AI4EO (artificial intelligence for Earth observation) is already mentioned in many papers published in recent years in the field of remote sensing; for this reason, in this paper, we will not insist in this topic, but we will mention a series of articles on this topic [68,69,70].

AI and deep learning have been applied in computer-aided diagnosis and research, allowing for advanced analysis and learning through simulations of the human brain [71]. The applications of AI in lung cancer are diverse and include tasks such as segmentation, detection, cell counting, and gene mutation prediction, demonstrating their potential in improving various aspects of lung cancer diagnosis and treatment [72].

In the context of lung diseases, there is a high global burden that encompasses a spectrum of diseases, including cancer, tuberculosis, idiopathic pulmonary fibrosis, and COVID-19 [73]. Among these, lung cancer stands out as a major cause of cancer-related death [74]. Furthermore, AI has shown promise in the quantification of imaging biomarkers, which are essential for the diagnosis, risk stratification, and assessment of treatment responses in lung cancer patients [73], while other emerging applications include areas like multimodal data analysis, 3D pathology, and transplant rejection prediction [73]. In thoracic imaging, AI has played a pivotal role, with deep learning techniques driving significant progress [75]. Deep learning (DL) algorithms have been developed for various lung cancer-related tasks, underlining their importance in the domain [76]. These algorithms have had a profound impact on medical image analysis, especially in the context of lung imaging [77]. However, it is important to be aware of possible bias in supervised deep learning algorithms used for CT lung nodule detection and classification, highlighting the need for ongoing research and refinement in this field [78]. The potential of DL in lung cancer extends from screening to prognostication, showcasing its versatility and significance [76].

On the other hand, colorectal diseases, with a particular focus on colorectal cancer (CRC), are a significant health concern with high incidence, morbidity, and mortality rates, as highlighted in several studies [74,79,80,81]. AI plays an expansive role in the diagnosis, treatment, and prognosis of CRC being applied in various fields, including imaging, endoscopy, and pathology [82]. CRC is the second most common cancer in women and the third most common in men [83]. This prevalence is accompanied by increasing incidence rates, which present growing diagnostic challenges in the field of colorectal cancer [83]. Various diagnostic methods—including excreta and blood tests, colonoscopy biopsy samples [84], computer-aided endoscopy, and medical imaging—are employed in the diagnosis of colorectal cancer [74], since early and accurate diagnosis is of paramount importance in improving survival rates, increasing cure rates, reducing mortality, and minimising medical costs associated with the disease. Thus, in the context of treatment decisions for colorectal cancer, the preoperative assessments of a large variety of biological variables [81] hold significant importance as they guide personalised treatment strategies for individuals with colorectal cancer [79]. Neoadjuvant chemoradiotherapy is the standard treatment for Locally Advanced Rectal Cancer (LARC) [85], while early evaluation of colorectal cancer liver metastasis (CRCLM) is crucial for determining treatment strategies and improving survival outcomes in patients who are beyond the locally advanced disease [86]. While CT is also utilised in the detection of distant metastases [81], MRI serves as a crucial diagnostic tool, offering accurate evaluation of tumour location, local staging, restaging, invasion depth assessment, localization of radiotherapy, prediction of chemotherapy response, detection of high-risk factors, and prognosis assessment [81,87]. To evaluate the diagnostic accuracy of AI in detecting lymph node metastasis in CRC, systematic reviews have been conducted with a focus on radiomics and deep learning in CT/MRI studies [88]. The results indicate that while radiomics exhibits high heterogeneity, deep learning, although less prevalent, proves to be effective. In particular, AI models, with a special emphasis on deep learning, demonstrate the potential for accurate prediction of lymph node metastasis in CRC [88]. Additionally, DL techniques have emerged as valuable tools in histopathology, offering diagnosis assistance in CRC diagnosis and the ability to predict molecular phenotypes, prognostic features, and even assess the tumour microenvironment, all of which contribute to a deeper understanding and improved management of colorectal diseases [83].

4.4. Limitations and Perspectives of AI Applications

AI tools for lung cancer pathology have evolved, incorporating hand-crafted and deep learning-based unsupervised features [73]. This evolution aligns with the emergence of digital pathology (DP), driven by advancements in computational power and whole-slide imaging technology [72]. DP, coupled with AI tools, is aiding pathologists and pulmonologists in various aspects of their work, from remote support to routine diagnosis [73]. The synergy between AI and pathology continues to advance the field, with promising implications for the accurate diagnosis and treatment of lung cancer.

Recent developments in the field show promise for CRC pathological analysis, emphasising the importance of accurate diagnosis and treatment planning [84]. Goals in this context include the removal of pre-cancerous polyps and reducing risks associated with unnecessary polypectomies, highlighting the need for precision in medical interventions [80]. Artificial intelligence, in particular, is emerging as a significant advancement in the field, with the potential to revolutionise clinical decision-making and improve outcomes for patients with colorectal diseases, especially CRC [89]. There is a growing emphasis on AI leveraging in colonoscopy, specifically in the areas of polyp detection and characterisation [80] or image mining and analysis for new insights into tumour biology, which is directly impacting clinical practice and decision-making [89].

Studies in this domain have explored various model features, including gland segmentation, tumour classification, microenvironment characterisation, and prognosis prediction [84]. While other models show promise, many are still in their early stages of development. New developments in CRC research are focusing on morphological biomarkers, dynamic evaluation of metastases, genetics, and the role of the liver–tumour interface, all of which contribute to a deeper understanding of tumour biology [89].

4.5. Limitations of This Study

The reason why, within our study, there are identification errors when using pathology slides with conventional haematoxylin–eosin staining is the presence on the slide of areas where the pink–violet hues are very close in tone. These areas are physically smaller in size compared to the others (usually with less clear contours). Usually, errors occur interspersed in extensive areas that actually correspond to other structures. The level of detail is dependent on the size of the patches used in supervised learning.

Moreover, errors in delimiting small vessels and airways can occur following tissue processing for histopathological analysis. In pathology, the processing of tissues for optical microscopy observations involves several steps, each of which can induce changes (alterations) in blood vessels, airways, and other tissue structures. The main processing steps and the associated changes are as follows:

Fixation using chemicals such as formalin to stabilise proteins and cellular structures to prevent autolysis and degradation. This process can induce artefacts by coagulating proteins and altering the appearance of blood vessels, causing contractions or stiffening.

Dehydration of tissue samples with increasing concentrations of alcohol, which can lead to a reduction in the volume of blood vessels and small airways and their collapse, thus altering their microscopic appearance.

Clearing, in which tissues are passed through clearing solutions (usually xylene or toluene), making them transparent and ready for paraffin infiltration. This can lead to additional alterations, such as vessel collapse and changes in the spatial relationships between structures, including lung airways.

Paraffin infiltration leading to mechanical artefacts, such as distortion or displacement of tissue structures, including blood vessels. This process can make vessels and airways appear more rigid and collapsed than in their natural state.

Microtomy or fine cutting of thin sections for microscopy, which can induce mechanical artefacts, such as cracking or distorting blood vessels. Sections may show compressed or deformed vessels and airways due to the pressure of the microtome blade.

Staining involving various chemicals (e.g., haematoxylin and eosin), which can accentuate or blur certain structural details of blood vessels. Sometimes, dyes can cause precipitates or other colouring artefacts, which can mask or alter the natural appearance of the dishes.

Fitting the tissue ultrathin sections onto the slides and covering them with another slide can induce mechanical pressure, which could compress or deform the blood vessels and airways.

The smaller the patch size, the more faithful the reproduction. It should be noted that areas, admittedly small in size, are incorrectly identified in the case of pathology slides, regardless of the size of the analysis patch. As a general rule, however, it can be deduced that the choice of analysis patch size depends on the objective of the analysis, i.e., the structures of interest for diagnosis and the precision required in their identification and delineation. A study in which the influence of the patch size vs. the EO image (resolution, size of objects in the EO image) was investigated in [11].

This process is used for the conjugate analysis of CT and/or (magnetic resonance imaging) MRI in conjunction with the analysis of pathological slides to identify biomarkers that can be extracted from texture analysis. Texture analysis in pathological images, at different organ levels (CT/MRI) and in tissue (pathological slide), provides a series of biomarkers that can be independent or correlated. If histopathological biomarkers are linked to histological type, degree of differentiation, and degree of invasion (perineural, intravascular), they can be correlated with imaging biomarkers (radiomics). Then, there is a shift in a diagnostic paradigm, whose integration into clinical practice is difficult to formulate, specify, or predict. However, our analysis can be complementary and even integrated into radiomics, with quite broad perspectives, especially if supervised learning can be defined in which to build relationships between patches beyond semantic classification. We think, first of all, of the topological relationships that allow the reduction in errors by relating a patch to neighbouring patches, when the analysis of the content would lead to a completely different classification than the patches in the analysed area.

Medical images also remain to be explored in various capture conditions. For example, CT images can be examined in different windows corresponding to optimal visualisation, depending on the anatomical region where the object is located, for instance, the chest window, brain window, etc., with or without contrast. Images from pathological slides can also be analysed at different sizes and in different colorations than the usual haematoxylin–eosin staining, such as immunohistochemical stains. In the case of both types of medical images mentioned, a rigorous definition of the classes belonging to healthy tissues and those corresponding to lesions is necessary. Last but not least, it is necessary to define criteria not only technical but also related to obtaining relevant information for diagnosis, so that it is possible to compare and evaluate in the form of benchmarking the various technical solutions of digital pathology.

The small sample size of the medical datasets used in this study represents an inherent limitation. As this work was designed as a proof-of-concept demonstration, the reported accuracy values are intended only to show the feasibility of applying EO-inspired semantic labelling to medical images, and not to establish clinical performance benchmarks. Future studies will involve significantly larger, multicentre datasets, enabling statistical hypothesis testing and more robust validation of the methodology.

However, the results in this study could help us to identify possible causes why one city has a higher number of patients with a particular disease (e.g., patients with cancer or schizophrenia) compared to another city where the number is smaller, considering the structure of the city (e.g., the surface of the green areas, number of hospitals, the surface of the industrial area). Such a perspective could be ground-breaking in medical epidemiology. These are the premises for a correlation between the information extracted from the Earth observation images related to environmental conditions and the information in medical images (such as disease radiological or histopathological phenotypes), providing in-depth knowledge about environment-related disease aetiologies. Furthermore, this analysis includes the temporal component of all types of images to study the evolution of the cities/areas or the patients in time. We would also like to mention that such a vision idea/conception has not been approached so far in the literature.

The results of the Earth observation image analysis should answer questions such as: How green is the city? How many parks are in the city? Is there a forest or a lake in the city or nearby? How many hospitals are there in the area? Is it an area with a high density of houses? Is it an industrialised city? This information should be grouped into a semantic model of the city/region, obtained based on these images and further validated using existing reference datasets (e.g., in situ measurements, CORINE Land Cover, Urban Atlas, OpenStreetMap), as well as the experience of remote sensing experts in the field.

Similarly, a semantic medical model can be obtained for a particular disease or several diseases by analysing the medical images. Comparing these results to previous hospital results and the physicians’ expertise validates them.

In creating a global health model of a city/area (see Figure 22), other information can be added to the previous two models. For instance, the Copernicus Sentinel-5P mission performs atmospheric measurements with a high spatio-temporal resolution. Thus, one can easily add the pollution level of the area, ozone levels, UV radiation, and climate monitoring and forecasting. Also, additional data layers can be added from the Copernicus Sentinel-3 mission dedicated to water vapour absorptions and atmospheric and aerosol monitoring. Figure 23 and Figure 24 show the domain ontology and the knowledge graph [5] created for the global health model of a city. In addition, the temporal component of images can be added to the model helping to observe the changes in a city. In addition, these urbanism or pollution features can be added to the medical information about monitored patients who, for example, are recovering from an illness.

Although this study does not include a full epidemiological case study, the unified methodology we propose has significant potential for future public health applications. By using identical semantic labelling workflows for both environmental and medical datasets, it becomes possible to generate geographically resolved maps that link environmental risk factors (e.g., pollution levels, urbanisation patterns) with disease phenotypes observed in medical images. Future work will focus on pilot studies, in which lung cancer imaging datasets will be correlated with EO-derived air quality data, as well as exploring links between colorectal cancer patterns and environmental indicators such as water quality and agricultural practices. These applications will require large, geographically indexed datasets and collaboration with public health institutions.

5. Conclusions

The described system comprises novel concepts to help Earth observation, medical, and other data users access and discover general information from large archives of images, and quickly collect the desired detailed information to act accordingly. This involves dealing with complicated spatial, structural, and temporal relationships between land surface categories and objects appearing in the image.

In this paper, we have shown the usefulness and the adaptation of the proposed method for the semantic labelling of various types of images acquired by different instruments.

In the first case, we considered EO images acquired by five different instruments covering four cities from different parts of the world. The method described in Figure 1 (top part) was successfully applied, and the remote sensing experts were able to semantically label these images with predefined labels [54], with an accuracy of 95% [11]. An important observation is that the better the sensor resolution, the more semantic classes and more accurate classification results. Also, an appropriate patch size depends on the sensor resolution; an optimal patch size vs. resolution is presented in [11,12]. A detailed comparison of different methods (e.g., CNN, auto-encoder, hybrid methods, etc.) applied to the EO images was performed in [12].

We considered medical images acquired with four different instruments in the second case. The method described in Figure 1 (bottom part) was successfully applied to medical images without major software modifications. Here, the expected image content is already considered when choosing an appropriate patch size and an efficient feature extraction technique. The grouping of the patches into semantic classes was made interactively by several experts in the field (physicians), who gave meaning to the extracted classes by assigning a best-fitting semantic label to each class (in this case, we used no predefined semantic labels like for the EO images). The visual accuracy obtained for these semantic classes lay between 80% and 85% (based on the feedback collected from the doctors). We point out that for medical images, a more detailed study must be performed about the optimal patch size and the features to be extracted, which depends on the type of the instrument used to acquire the medical image and on the content of the image.

The reason behind the lower (visual) results of the unsupervised learning may be the fact that the features are not captured by the 4 × 4 pixels. The size of this patch needs to be adjusted according to the resolution and size of the objects (class content too small) in the case of the medial images. Regarding the EO images, a patch of 4 × 4 pixels gives good results using this method [42].

Author Contributions

Conceptualization, L.B. and C.O.D.; methodology, L.B. and C.O.D.; software, C.O.D.; validation, A.D. and F.A.; formal analysis, L.B. and C.O.D.; investigation, A.D., F.A., L.B. and C.O.D.; resources, R.P. and O.B.; data curation, C.O.D.; writing—original draft preparation, L.B. and C.O.D.; writing—review and editing, R.P., O.B. and A.I.S.; visualisation, L.B. and C.O.D.; supervision, O.B. and A.I.S.; project administration, C.O.D. and L.B.; funding acquisition, R.P. and O.B. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data generated or analysed during this study are included in the manuscript and the references cited therein. No additional data are available.

Acknowledgments

The TerraSAR-X image data being used in this study were provided by the TerraSARX Science Service System (Proposals MTH 1118 and LAN-3156), while the WorldView-2 image data were provided by the European Space Imaging (EUSI). The Sentinel-1 and Sentinel-2 image data are freely available via the Copernicus Open Access Hub. The activity related to EO images was supported in recent years by different projects funded by the European Commission under the FP7 Programmes (TELEIOS) and the H2020 Programmes (CANDELA and ExtremeEarth). Some of the results were obtained during my work in the DIKD (Data Intelligence and Knowledge Discovery) team. I would like to thank the project partners and DLR colleagues. A warm thanks to Gottfried Schwarz for his support in providing valuable comments in order to improve this contribution. Many thanks to Chandrabali Karmakar for her support related to LDA algorithm developed within the ExtremeEarth project and used for comparison in this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AIartificial intelligence
AI4EOartificial intelligence for Earth observation
AI4Healthartificial intelligence for Health applications
CRCcolorectal cancer
CRCLMcolorectal cancer liver metastasis
CTcomputed tomography
DLdeep learning
DLRDeutsches Zentrum für Luft- und Raumfahrt (German Aerospace Centre)
DPdigital pathology
EOEarth observation
ESAEuropean Space Agency
HEhaematoxylin–eosin
LARClocally advanced rectal cancer
MRImagnetic resonance imaging
SARsynthetic aperture radar
SVMSupport Vector Machine

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1 The proposed workflow methodology for semantic classification of EO and medical images. The following steps to be followed are data acquisition, patch tiling, feature extraction (Gabor filters for EO, Weber descriptors for medical images), SVM-based active learning with expert feedback, and generation of semantic maps and knowledge graphs. The details are given in Section 2.1.

View Image -

Figure 2 Semantic classification results for the city of Bucharest, Romania, using TerraSAR-X (top) and WorldView-2 [26] (bottom). The images were acquired on 15 August 2009 and 13 December 2012, respectively.

View Image -

Figure 3 A dataset of the city of Berlin, Germany (acquired by GF-3 [21] and TerraSAR-X [25]). (Top part, from left to right): A quick-look view of a GF-3 image acquired on 27 July 2018 and its semantic classification results. (Bottom part, from left to right): A quick-look view of a TerraSAR-X image acquired on 19 September 2008 and its semantic classification results.

View Image -

Figure 4 A dataset of the city of Vancouver, Canada (acquired by TerraSAR-X [25] and RADARSAT-2 [22]). (Top part, from left to right): A quick-look view of a TerraSAR-X image acquired on 20 April 2008, and its semantic classification results. (Bottom part, from left to right): A quick-look view of a RADARSAT-2 image acquired on 4 April 2008 and its semantic classification results.

View Image -

Figure 5 A dataset covering Albania and Greece (acquired by TerraSAR-X [39], Sentinel-1 [40], and Sentinel-2 [41]). (Top part, from left to right): A quick-look view of a TerraSAR-X image acquired on 11 January 2017 and its semantic classification results. (Centre part, from left to right): A quick-look view of a Sentinel-1 image acquired on 15 January 2017 and its semantic classification results. (Bottom part, from left to right): An RGB quick-look view of a Sentinel-2 image acquired on 15 October 2016 and its semantic classification results.

View Image -

Figure 6 Colorectal adenocarcinoma image (ID KS-172), magnification ×10 and the semantic classification results using two different patch sizes: 48 × 48 (top right) and 24 × 24 (bottom right).

View Image -

Figure 7 (Left part) An RGB image (ID KS-172), magnification ×10 selected from the third dataset that correspond to colorectal adenocarcinoma. (Right part) The semantic classification results using a patch size of 16 × 16 pixels.

View Image -

Figure 8 (Left part) An RGB image (ID KS-172), total magnification ×100 selected from the third dataset that corresponds to colorectal adenocarcinoma. (Right part) The unsupervised classification results using a patch size of 4 × 4 pixels. The colours (right part) have been assigned without human supervision eight classes which, however, do not correspond to cellular or tissular structure. The colours band is added to allow counting the number of classes which have been found through unsupervised classification.

View Image -

Figure 9 (Top to bottom) and (left to right) Four RGB images (LungFCP-01-0001_b1, LungFCP-01-0002_b1, LungFCP-01-0001_b3, and LungFCP-01-0001_b4), total magnification ×20 selected from the second dataset that correspond to different lung adenocarcinoma. The numbered green squares are the sampling or the sub-images zones used to run our model.

View Image -

Figure 10 (Top part, left) An RGB image selected from the sub-image 15 (total magnification ×40) in Figure 9 that correspond to lung adenocarcinoma. (Top part, right) The semantic classification results using a patch size of 48 × 48 pixels. (Bottom part, right) The semantic classification results using a patch size of 32 × 32 pixels.

View Image -

Figure 11 (Top part, left) An RGB image selected from the sub-images 15 in Figure 9 that correspond to lung adenocarcinoma. (Top part, right), total magnification ×40 The semantic classification results using a patch size of 16 × 16 pixels. (Bottom part, right) The semantic classification results using a patch size of 8 × 8 pixels.

View Image -

Figure 12 (Top part, left) An RGB image selected from the sub-images with the ID 15 (see Figure 9), total magnification ×40 (objective ×4) that correspond to lung adenocarcinoma. The unsupervised classification results are obtained using a patch size of 4 × 4 pixels. (Top part, right) Presents the results with 6 classes/topics. (Centre part, right) Presents the results with 11 classes/topics. (Bottom part, right) Presents the results with 12 classes/topics. The colours on the colours bands under the images in the right part have been assigned without human supervision to six (top), eleven (middle) and twelve (bottom) classes which, however, do not correspond to cellular or tissular structure. The colours bands are added to allow counting the number of classes which have been found through unsupervised classification.

View Image -

Figure 13 Computed tomography (CT) of a human lung from a patient diagnosed with cancer. (Top part, left) A grey image selected from the dataset that corresponds to lung cancer. (Top part, right) The semantic classification results using a patch size of 48 × 48 pixels. (Bottom part, right) The semantic classification results using a patch size of 32 × 32 pixels.

View Image -

Figure 14 Computed tomography (CT) of a human lung from a patient diagnosed with cancer. (Top part, left) A grey image selected from the dataset that corresponds to lung cancer. (Top part, right) The semantic classification results using a patch size of 24 × 24 pixels. (Bottom part, right) The semantic classification results using a patch size of 16 × 16 pixels.

View Image -

Figure 15 CT image of a healthy lung (top) and CT image of a lung with cancer (bottom), along with their corresponding semantic classification using patches of 16 × 16 pixels.

View Image -

Figure 16 Computed tomography (CT) of a human lung from a healthy patient and patients diagnosed with different diseases (e.g., cancer). (Top part, left) There are three grey images selected from the dataset that were acquired by CT scan. The unsupervised classification results are obtained using a patch size of 4 × 4 pixels with 12 classes/topics. (Top part, right) Presents the results of a control image that does not contain any abnormalities or diseases. (Centre part, right) Presents the results of an image of a patient who developed a cancerous tumour in the lung. (Bottom part, right) Presents the results of an image of the same patient who developed a cancerous tumour in a different part of the lung. The colours on the colours bands under the images in the right part have been assigned without human supervision twelve classes which, however, do not correspond to any tissular or organ structure. The colours bands are added to allow counting the number of classes which have been found through unsupervised classification.

View Image -

Figure 17 A model of the city of Bucharest, Romania: (left side) a comparison of the semantic classes generated by each instrument and (right side) a comparison of the classes that define a green city (broadleaf forest, grassland, rivers, sports grounds) and the classes that describe the infrastructure of the city (admin. compounds and monument areas, bridges, cemeteries, high-density residential areas, medium-density residential areas, mixed urban areas, parking areas, roads).

View Image -

Figure 18 A model of the city of Berlin, Germany: (left side) a comparison of the semantic classes generated by each instrument and (right side) a comparison of the classes that define the concept green city and the classes that define the infrastructure of the city.

View Image -

Figure 19 A model of the city of Vancouver, Canada: (left side) a comparison of the semantic classes generated by each instrument and (right side) a comparison of the classes that define a green city and the classes that form the infrastructure of the city.

View Image -

Figure 20 Knowledge graph linking the semantic classes of two patients with lung tumours.

View Image -

Figure 21 (a) The first comparison is “Machine learning” and “Remote sensing imaging” versus “Machine learning” and “Medical imaging”. (b) The second comparison is “Machine learning” and “EO images” versus “Machine learning” and “CT images”. (c) The last comparison is “Active learning” and “EO” versus “Active learning” and “CT”.

View Image -

Figure 22 The proposed flowchart to achieve a global health model for a city includes the EO model, the medical model, and other information (pollution level over a city analysed using, for example, Sentinel-5P) that can help create such a model.

View Image -

Figure 23 A domain ontology linked to a health application. The short abbreviations are L for Labels, EO for Earth Observation, CT for computed tomography, and AT for atmospheric.

View Image -

Figure 24 Knowledge-graph representation of a model adapted to the global health of a city domain.

View Image -

Parameters used for model training on EO and medical images with SVM.

Parameter EO Images Medical Images
Kernel type Chi-squared kernel Chi-squared kernel
Multi-class strategy One-against-all One-against-all
Regularisation parameter (C) 1.0 (default) 1.0 (default)
Gamma (kernel coefficient) 1/(number of features) 1/(number of features)
Cross-validation strategy 5-fold cross-validation 5-fold cross-validation
Active learning cycles 3–5 iterations depending on dataset size 3–5 iterations depending on dataset size

List of selected EO images with their parameters.

Location Instrument Type Mode No. of Sensor Bands/Selected Bands Resolution Polarisation Patch Size (Pixels) No. of Patches
Berlin, Germany Gaofen-3 C-band SAR SpotLight (SL) 1/1 1 m HH 256 × 256 2080
TerraSAR-X X-band SAR Multi-look Ground range Detected (MGD) 1/1 2.9 m HH 160 × 160 1025
Bucharest, Romania TerraSAR-X X-band SAR Multi-look Ground range Detected (MGD) 1/1 2.9 m HH 160 × 160 4455
WorldView-2 Multi-spectral - 8/3 (RGB) 1.87 m - 100 × 100 33,930
Vancouver, Canada RADARSAT-2 C-band SAR Extended high (EH) 1/1 13.5 m HH 160 × 160 660
TerraSAR-X X-band SAR Multi-look Ground range Detected (MGD) 1/1 2.9 m HH 160 × 160 825
Albania and Greece TerraSAR-X X-band SAR Multi-look Ground range Detected (MGD) 1/1 2.9 m HH 160 × 160 1872
Sentinel-1 C-band SAR Interferometric Wide swath (IW) 1/1 20 m VV/VH 128 × 128 26,260
Sentinel-2 Multi-spectral - 13/3 (RGB) 10/20/60 m - 120 × 120 8281

List of selected medical images with their parameters.

Data Instrument No. of Bands/Selected Bands Image Dimension (Pixels) Sub-Images (Pixels) Patch Size (Pixels) No. of Patches
Colorectal adenocarcinoma Optical microscopy 3/3 1024 × 768 - 4 × 4 49,152
Lung adenocarcinoma Optical microscopy 3/3 avg. 9688 × 9832 890 × 801 4 × 4 44,400
Non-small cell lung cancer Computer tomography (CT) scan 1/1 avg. 1802 × 884 1372 × 672 4 × 4 57,624

References

1. Sudmanns, M.; Tiede, D.; Lang, S.; Bergstedt, H.; Trost, G.; Augustin, H.; Baraldi, A.; Blaschke, T. Big Earth Data: Disruptive Changes in Earth Observation Data Management and Analysis?. Int. J. Digit. Earth; 2020; 13, pp. 832-850. [DOI: https://dx.doi.org/10.1080/17538947.2019.1585976]

2. Satellite Missions Directory—Earth Observation Missions—eoPortal. Available online: https://eoportal.org/web/eoportal/satellite-missions/ (accessed on 28 April 2022).

3. DLR—Earth Observation Center—60 Petabytes for the German Satellite Data Archive D-SDA. Available online: https://www.dlr.de/eoc/en/desktopdefault.aspx/tabid-12632/22039_read-51751 (accessed on 28 April 2022).

4. Copernicus Annual Reports, Open Access Hub. Available online: https://dataspace.copernicus.eu/news/2024-11-5-copernicus-data-space-ecosystem-cdse-releases-annual-report-2023 (accessed on 15 October 2025).

5. Dumitru, C.O.; Schwarz, G.; Datcu, M. Semantic Labeling of Globally Distributed Urban and Nonurban Satellite Images Using High-Resolution SAR Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.; 2021; 14, pp. 6009-6068. [DOI: https://dx.doi.org/10.1109/JSTARS.2021.3084314]

6. Mallappallil, M.; Sabu, J.; Gruessner, A.; Salifu, M. A Review of Big Data and Medical Research. SAGE Open Med.; 2020; 8, 2050312120934839. [DOI: https://dx.doi.org/10.1177/2050312120934839] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32637104]

7. Sfikas, G. List of Medical Imaging Datasets. Github Repository. Available online: https://github.com/sfikas/medical-imaging-datasets (accessed on 28 April 2022).

8. The Cancer Imaging Archive (TCIA). Available online: https://www.cancerimagingarchive.net/ (accessed on 28 April 2022).

9. High Quality Pathology Images—WebPathology. Available online: http://webpathology.com (accessed on 28 April 2022).

10. Dumitru, C.O.; Cui, S.; Faur, D.; Datcu, M. Data Analytics for Rapid Mapping: Case Study of a Flooding Event in Germany and the Tsunami in Japan Using Very High Resolution SAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.; 2015; 8, pp. 114-129. [DOI: https://dx.doi.org/10.1109/JSTARS.2014.2320777]

11. Dumitru, C.O.; Schwarz, G.; Datcu, M. AL4SLEO: An Active Learning Solution for the Semantic Labelling of Earth Observation Satellite Images—Part 1. Benchmarks and Hybrid Algorithms in Optimization and Applications; Yang, X.-S. Springer Tracts in Nature-Inspired Computing Springer Nature: Singapore, 2023; pp. 105-118. ISBN 978-981-99-3970-1

12. Dumitru, C.O.; Schwarz, G.; Datcu, M. AL4SLEO: An Active Learning Solution for the Semantic Labelling of Earth Observation Satellite Images—Part 2. Benchmarks and Hybrid Algorithms in Optimization and Applications; Yang, X.-S. Springer Tracts in Nature-Inspired Computing Springer Nature: Singapore, 2023; pp. 119-146. ISBN 978-981-99-3970-1

13. MPEG-7 Standard. Available online: https://mpeg.chiariglione.org/standards/mpeg-7.html (accessed on 15 October 2025).

14. Manjunath, B.S.; Ma, W.Y. Texture Features for Browsing and Retrieval of Image Data. IEEE Trans. Pattern Anal. Mach. Intell.; 1996; 18, pp. 837-842. [DOI: https://dx.doi.org/10.1109/34.531803]

15. Chen, J.; Shan, S.; He, C.; Zhao, G.; Pietikäinen, M.; Chen, X.; Gao, W. WLD: A Robust Local Image Descriptor. IEEE Trans. Pattern Anal. Mach. Intell.; 2010; 32, pp. 1705-1720. [DOI: https://dx.doi.org/10.1109/TPAMI.2009.155]

16. Cui, S.; Dumitru, C.O.; Datcu, M. Semantic Annotation in Earth Observation Based on Active Learning. Int. J. Image Data Fusion.; 2014; 5, pp. 152-174. [DOI: https://dx.doi.org/10.1080/19479832.2013.858778]

17. Sentinels Scientific Data Hub. Available online: https://scihub.copernicus.eu/dhus/#/home (accessed on 28 April 2022).

18. EOWEB GeoPortal—TerraSAR-X Archive Portal. Available online: https://eoweb.dlr.de/egp/ (accessed on 28 April 2022).

19. Dumitru, C.O.; Datcu, M. Information Content of Very High Resolution SAR Images: Study of Feature Extraction and Imaging Parameters. IEEE Trans. Geosci. Remote Sens.; 2013; 51, pp. 4591-4610. [DOI: https://dx.doi.org/10.1109/TGRS.2013.2265413]

20. The European Space Agency. Earth Observation Satellite Missions. Available online: https://earth.esa.int/eogateway/search?skipDetection=true&text=&category=Missions (accessed on 28 April 2022).

21. Gaofen-3 Sensor Parameter Description. Available online: https://directory.eoportal.org/web/eoportal/satellite-missions/g/gaofen-3 (accessed on 28 April 2022).

22. Canadian Space Agency. RADARSAT Satellites: Technical Comparison. Available online: https://www.asc-csa.gc.ca/eng/satellites/radarsat/technical-features/radarsat-comparison.asp (accessed on 28 April 2022).

23. Sentinel-1—Missions. Available online: https://sentiwiki.copernicus.eu/web/s1-mission (accessed on 15 October 2025).

24. Sentinel-2—Missions. Available online: https://sentinels.copernicus.eu/copernicus/sentinel-2 (accessed on 15 October 2025).

25. TerraSAR-X Sensor Parameter Description and Data Access. Available online: https://sss.terrasar-x.dlr.de/ (accessed on 28 April 2022).

26. WorldView Sensor Parameter Description. Available online: https://www.satimagingcorp.com/satellite-sensors/worldview-2/ (accessed on 28 April 2022).

27. CANDELA (Copernicus Access Platform Intermediate Layers Small Scale Demonstrator) Project. Available online: http://www.candela-h2020.eu/ (accessed on 28 April 2022).

28. Dumitru, C.O.; Schwarz, G.; Pulak-Siwiec, A.; Kulawik, B.; Albughdadi, M.; Lorenzo, J.; Datcu, M. Understanding Satellite Images: A Data Mining Module for Sentinel Images. Big Earth Data; 2020; 4, pp. 367-408. [DOI: https://dx.doi.org/10.1080/20964471.2020.1820168]

29. Murphy, A. Radiological Medical Image Datasets Build for Artificial Intelligence, Radiopaedia.Org. Available online: https://radiopaedia.org/articles/imaging-data-sets-artificial-intelligence (accessed on 28 April 2022).

30. WebPathology: Case Colo-Rectal Adenocarcinoma. Available online: https://www.webpathology.com/search-result?query=colorectal%20adenocarcinoma (accessed on 27 April 2022).

31. Aerts, H.J.W.L.; Velazquez, E.R.; Leijenaar, R.T.H.; Parmar, C.; Grossmann, P.; Carvalho, S.; Bussink, J.; Monshouwer, R.; Haibe-Kains, B.; Rietveld, D. . Decoding Tumour Phenotype by Noninvasive Imaging Using a Quantitative Radiomics Approach. Nat. Commun.; 2014; 5, 4006. [DOI: https://dx.doi.org/10.1038/ncomms5006]

32. WebPathology: Case Lung Adenocarcinoma. Available online: https://www.webpathology.com/case.asp?case=416 (accessed on 28 April 2022).

33. Rusu, M.; Rajiah, P.; Gilkeson, R.; Yang, M.; Donatelli, C.; Thawani, R.; Jacono, F.J.; Linden, P.; Madabhushi, A. Co-Registration of Pre-Operative CT with Ex Vivo Surgically Excised Ground Glass Nodules to Define Spatial Extent of Invasive Adenocarcinoma on in Vivo Imaging: A Proof-of-Concept Study. Eur. Radiol.; 2017; 27, pp. 4209-4217. [DOI: https://dx.doi.org/10.1007/s00330-017-4813-0]

34. Aerts, H.J.W.L.; Wee, L.; Rios Velazquez, E.; Leijenaar, R.T.H.; Parmar, C.; Grossmann, P.; Carvalho, S.; Bussink, J.; Monshouwer, R.; Haibe-Kains, B. . Data from NSCLC-Radiomics. 2019; Available online: https://www.cancerimagingarchive.net/collection/nsclc-radiomics/ (accessed on 27 April 2022).

35. Colapicchioni, A. KES: Knowledge Enabled Services for Better EO Information Use. Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS’04); Anchorage, AK, USA, 20–24 September 2004; IEEE: Anchorage, AK, USA, 2004; Volume 1, pp. 176-179.

36. Dumitru, C.O.; Schwarz, G.; Datcu, M. Machine Learning Techniques for Knowledge Extraction from Satellite Images: Application to Specific Area Types. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci.; 2021; XLIII-B3-2021, pp. 455-462. [DOI: https://dx.doi.org/10.5194/isprs-archives-XLIII-B3-2021-455-2021]

37. Earth from Space: Bucharest, Romania. 2021; Available online: https://www.esa.int/ESA_Multimedia/Videos/2021/04/Earth_from_Space_Bucharest (accessed on 27 April 2022).

38. Dumitru, C.O.; Schwarz, G.; Cui, S.; Datcu, M. Improved Image Classification by Proper Patch Size Selection: TerraSAR-X vs. Sentinel-1A. Proceedings of the 2016 International Conference on Systems, Signals and Image Processing (IWSSIP); Bratislava, Slovakia, 23–25 May 2016; IEEE: New York, NY, USA, pp. 1-4.

39. TerraSAR-X and TanDEM-X—Earth Online. Available online: https://earth.esa.int/eogateway/missions/terrasar-x-and-tandem-x (accessed on 15 October 2025).

40. Sentinel-1—Sentinel Online. Available online: https://sentinels.copernicus.eu/copernicus/sentinel-1 (accessed on 15 October 2025).

41. ESA—Sentinel-2. Available online: https://www.esa.int/Applications/Observing_the_Earth/Copernicus/Sentinel-2 (accessed on 15 October 2025).

42. Karmakar, C.; Dumitru, C.O.; Schwarz, G.; Datcu, M. Feature-Free Explainable Data Mining in SAR Images Using Latent Dirichlet Allocation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.; 2021; 14, pp. 676-689. [DOI: https://dx.doi.org/10.1109/JSTARS.2020.3039012]

43. Russell, B.C.; Torralba, A.; Murphy, K.P.; Freeman, W.T. LabelMe: A Database and Web-Based Tool for Image Annotation. Int. J. Comput. Vis.; 2008; 77, pp. 157-173. [DOI: https://dx.doi.org/10.1007/s11263-007-0090-8]

44. Smeulders, A.W.M.; Worring, M.; Santini, S.; Gupta, A.; Jain, R. Content-Based Image Retrieval at the End of the Early Years. IEEE Trans. Pattern Anal. Mach. Intell.; 2000; 22, pp. 1349-1380. [DOI: https://dx.doi.org/10.1109/34.895972]

45. Shyu, C.-R.; Klaric, M.; Scott, G.J.; Barb, A.S.; Davis, C.H.; Palaniappan, K. GeoIRIS: Geospatial Information Retrieval and Indexing System—Content Mining, Semantics Modeling, and Complex Queries. IEEE Trans. Geosci. Remote Sens.; 2007; 45, pp. 839-852. [DOI: https://dx.doi.org/10.1109/TGRS.2006.890579] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18270555]

46. Witten, I.H.; Frank, E.; Hall, M.A. Data Mining: Practical Machine Learning Tools and Techniques; 3rd ed. Morgan Kaufmann Series in Data Management Systems Morgan Kaufmann: Burlington, MA, USA, 2011; ISBN 978-0-12-374856-0

47. Wang, S.; Cao, J.; Yu, P.S. Deep Learning for Spatio-Temporal Data Mining: A Survey. arXiv; 2019; arXiv: 1906.04928[DOI: https://dx.doi.org/10.1109/TKDE.2020.3025580]

48. Huang, Z.; Dumitru, C.O.; Ren, J. Physics-Aware Feature Learning of Sar Images with Deep Neural Networks: A Case Study. Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS; Brussels, Belgium, 11–16 July 2021; pp. 1264-1267.

49. Stepinski, T.F.; Netzel, P.; Jasiewicz, J. LandEx—A GeoWeb Tool for Query and Retrieval of Spatial Patterns in Land Cover Datasets. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.; 2014; 7, pp. 257-266. [DOI: https://dx.doi.org/10.1109/JSTARS.2013.2260727]

50. Espinoza-Molina, D.; Manilici, V.; Dumitru, C.; Reck, C.; Cui, S.; Rotzoll, H.; Hofmann, M.; Schwarz, G.; Datcu, M. The Earth Observation Image Librarian (EOLIB): The Data Mining Component of the TerraSAR-X Payload Ground Segment. Proceedings of the Big Data from Space (BiDS’16); Santa Cruz de Tenerife, Spain, 15–17 March 2016; Soille, P.; Marchetti, P.G. European Union: Brussels, Belgium, 2016; pp. 228-231.

51. Zhou, J.; Cao, R.; Kang, J.; Guo, K.; Xu, Y. An Efficient High-Quality Medical Lesion Image Data Labeling Method Based on Active Learning. IEEE Access; 2020; 8, pp. 144331-144342. [DOI: https://dx.doi.org/10.1109/ACCESS.2020.3014355]

52. Esteva, A.; Chou, K.; Yeung, S.; Naik, N.; Madani, A.; Mottaghi, A.; Liu, Y.; Topol, E.; Dean, J.; Socher, R. Deep Learning-Enabled Medical Computer Vision. NPJ Digit. Med.; 2021; 4, 5. [DOI: https://dx.doi.org/10.1038/s41746-020-00376-2]

53. Baumgartner, C. Herausforderungen und Chancen von KI in der Medizinischen Bildgebung. Exzellenzcluster “Maschinelles Lernen: Neue Perspektiven für die Wissenschaft“, Universität Tübingen. Available online: https://www.mlmia-unitue.de/uploads/lndw-final.pdf (accessed on 28 April 2022).

54. Dumitru, C.O.; Schwarz, G.; Datcu, M. Land Cover Semantic Annotation Derived from High-Resolution SAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.; 2016; 9, pp. 2215-2232. [DOI: https://dx.doi.org/10.1109/JSTARS.2016.2549557]

55. Cook, D.J.; Holder, L.B. Mining Graph Data; Wiley-Interscience: Hoboken, NJ, USA, 2007; ISBN 978-0-471-73190-0

56. Samatova, N.F. Practical Graph Mining with R; Chapman & Hall/CRC Data Mining and Knowledge Discovery Series; Taylor & Francis: Boca Raton, FL, USA, 2014; ISBN 978-1-4398-6084-7

57. Vieira, A. Knowledge Representation in Graphs Using Convolutional Neural Networks. arXiv; 2016; [DOI: https://dx.doi.org/10.48550/arXiv.1612.02255] arXiv: 1612.02255

58. Sullivan, D. Google Launches Knowledge Graph to Provide Answers, Not Just Links. Available online: https://www.techmeme.com/120516/p37#a120516p37 (accessed on 28 April 2022).

59. Dumitru, C.O.; Schwarz, G.; Datcu, M. Image Representation Alternatives for the Analysis of Satellite Image Time Series. Proceedings of the 2017 9th International Workshop on the Analysis of Multitemporal Remote Sensing Images (MultiTemp); Brugge, Belgium, 27–29 June 2017; pp. 1-4.

60. Karalis, N.; Mandilaras, G.; Koubarakis, M. Extending the YAGO2 Knowledge Graph with Precise Geospatial Knowledge. The Semantic Web—ISWC 2019; Ghidini, C.; Hartig, O.; Maleshkova, M.; Svátek, V.; Cruz, I.; Hogan, A.; Song, J.; Lefrançois, M.; Gandon, F. Lecture Notes in Computer Science Springer International Publishing: Cham, Switzerland, 2019; Volume 11779, pp. 181-197. ISBN 978-3-030-30795-0

61. Ji, S.; Pan, S.; Cambria, E.; Marttinen, P.; Yu, P.S. A Survey on Knowledge Graphs: Representation, Acquisition and Applications. IEEE Trans. Neural Netw. Learn. Syst.; 2022; 33, pp. 494-514. [DOI: https://dx.doi.org/10.1109/TNNLS.2021.3070843]

62. Réjichi, S.; Chaabane, F.; Tupin, F. Expert Knowledge-Based Method for Satellite Image Time Series Analysis and Interpretation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.; 2015; 8, pp. 2138-2150. [DOI: https://dx.doi.org/10.1109/JSTARS.2015.2433257]

63. Rotmensch, M.; Halpern, Y.; Tlimat, A.; Horng, S.; Sontag, D. Learning a Health Knowledge Graph from Electronic Medical Records. Sci. Rep.; 2017; 7, 5994. [DOI: https://dx.doi.org/10.1038/s41598-017-05778-z] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28729710]

64. Roscher, R.; Bohn, B.; Duarte, M.F.; Garcke, J. Explainable Machine Learning for Scientific Insights and Discoveries. IEEE Access; 2020; 8, pp. 42200-42216. [DOI: https://dx.doi.org/10.1109/ACCESS.2020.2976199]

65. Li, X.-H.; Cao, C.C.; Shi, Y.; Bai, W.; Gao, H.; Qiu, L.; Wang, C.; Gao, Y.; Zhang, S.; Xue, X. . A Survey of Data-Driven and Knowledge-Aware eXplainable AI. IEEE Trans. Knowl. Data Eng.; 2022; 34, pp. 29-49. [DOI: https://dx.doi.org/10.1109/TKDE.2020.2983930]

66. Amann, J.; Blasimme, A.; Vayena, E.; Frey, D.; Madai, V.I. The Precise4Q consortium Explainability for Artificial Intelligence in Healthcare: A Multidisciplinary Perspective. BMC Med. Inform. Decis. Mak.; 2020; 20, 310. [DOI: https://dx.doi.org/10.1186/s12911-020-01332-6]

67. TELEIOS Project, Deliverable D6.4 “The Virtual Observatory for TerraSAR-X Data and Applications—Phase II”. Available online: https://cordis.europa.eu/project/id/257662 (accessed on 28 April 2022).

68. Tharwat, A.; Schenck, W. A Survey on Active Learning: State-of-the-Art, Practical Challenges and Research Directions. Mathematics; 2023; 11, 820. [DOI: https://dx.doi.org/10.3390/math11040820]

69. Mosqueira-Rey, E.; Hernández-Pereira, E.; Alonso-Ríos, D.; Bobes-Bascarán, J.; Fernández-Leal, Á. Human-in-the-Loop Machine Learning: A State of the Art. Artif. Intell. Rev.; 2023; 56, pp. 3005-3054. [DOI: https://dx.doi.org/10.1007/s10462-022-10246-w]

70. Lenczner, G.; Chan-Hon-Tong, A.; Le Saux, B.; Luminari, N.; Le Besnerais, G. DIAL: Deep Interactive and Active Learning for Semantic Segmentation in Remote Sensing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.; 2022; 15, pp. 3376-3389. [DOI: https://dx.doi.org/10.1109/JSTARS.2022.3166551]

71. Cong, L.; Feng, W.; Yao, Z.; Zhou, X.; Xiao, W. Deep Learning Model as a New Trend in Computer-Aided Diagnosis of Tumor Pathology for Lung Cancer. J. Cancer; 2020; 11, pp. 3615-3622. [DOI: https://dx.doi.org/10.7150/jca.43268]

72. Sakamoto, T.; Furukawa, T.; Lami, K.; Pham, H.H.N.; Uegami, W.; Kuroda, K.; Kawai, M.; Sakanashi, H.; Cooper, L.A.D.; Bychkov, A. . A Narrative Review of Digital Pathology and Artificial Intelligence: Focusing on Lung Cancer. Transl. Lung Cancer Res.; 2020; 9, pp. 2255-2276. [DOI: https://dx.doi.org/10.21037/tlcr-20-591] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33209648]

73. Viswanathan, V.S.; Toro, P.; Corredor, G.; Mukhopadhyay, S.; Madabhushi, A. The State of the Art for Artificial Intelligence in Lung Digital Pathology. J. Pathol.; 2022; 257, pp. 413-429. [DOI: https://dx.doi.org/10.1002/path.5966] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35579955]

74. Cai, Y.-W.; Dong, F.-F.; Shi, Y.-H.; Lu, L.-Y.; Chen, C.; Lin, P.; Xue, Y.-S.; Chen, J.-H.; Chen, S.-Y.; Luo, X.-B. Deep Learning Driven Colorectal Lesion Detection in Gastrointestinal Endoscopic and Pathological Imaging. World J. Clin. Cases; 2021; 9, pp. 9376-9385. [DOI: https://dx.doi.org/10.12998/wjcc.v9.i31.9376]

75. Chassagnon, G.; De Margerie-Mellon, C.; Vakalopoulou, M.; Marini, R.; Hoang-Thi, T.-N.; Revel, M.-P.; Soyer, P. Artificial Intelligence in Lung Cancer: Current Applications and Perspectives. Jpn. J. Radiol.; 2023; 41, pp. 235-244. [DOI: https://dx.doi.org/10.1007/s11604-022-01359-x]

76. Lee, J.; Hwang, E.; Kim, H.; Park, C. A Narrative Review of Deep Learning Applications in Lung Cancer Research: From Screening to Prognostication. Transl. Lung Cancer Res.; 2022; 11, pp. 1217-1229. [DOI: https://dx.doi.org/10.21037/tlcr-21-1012]

77. Astley, J.R.; Wild, J.M.; Tahir, B.A. Deep Learning in Structural and Functional Lung Image Analysis. Br. J. Radiol.; 2022; 95, 20201107. [DOI: https://dx.doi.org/10.1259/bjr.20201107]

78. Sourlos, N.; Wang, J.; Nagaraj, Y.; van Ooijen, P.; Vliegenthart, R. Possible Bias in Supervised Deep Learning Algorithms for CT Lung Nodule Detection and Classification. Cancers; 2022; 14, 3867. [DOI: https://dx.doi.org/10.3390/cancers14163867]

79. Hou, M.; Sun, J.-H. Emerging Applications of Radiomics in Rectal Cancer: State of the Art and Future Perspectives. World J. Gastroenterol.; 2021; 27, pp. 3802-3814. [DOI: https://dx.doi.org/10.3748/wjg.v27.i25.3802]

80. Joseph, J.; LePage, E.M.; Cheney, C.P.; Pawa, R. Artificial Intelligence in Colonoscopy. World J. Gastroenterol.; 2021; 27, pp. 4802-4817. [DOI: https://dx.doi.org/10.3748/wjg.v27.i29.4802]

81. Stanzione, A.; Verde, F.; Romeo, V.; Boccadifuoco, F.; Mainenti, P.P.; Maurea, S. Radiomics and Machine Learning Applications in Rectal Cancer: Current Update and Future Perspectives. World J. Gastroenterol.; 2021; 27, pp. 5306-5321. [DOI: https://dx.doi.org/10.3748/wjg.v27.i32.5306]

82. Liang, F.; Wang, S.; Zhang, K.; Liu, T.-J.; Li, J.-N. Development of Artificial Intelligence Technology in Diagnosis, Treatment, and Prognosis of Colorectal Cancer. World J. Gastrointest. Oncol.; 2022; 14, pp. 124-152. [DOI: https://dx.doi.org/10.4251/wjgo.v14.i1.124] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35116107]

83. Davri, A.; Birbas, E.; Kanavos, T.; Ntritsos, G.; Giannakeas, N.; Tzallas, A.T.; Batistatou, A. Deep Learning on Histopathological Images for Colorectal Cancer Diagnosis: A Systematic Review. Diagnostics; 2022; 12, 837. [DOI: https://dx.doi.org/10.3390/diagnostics12040837]

84. Thakur, N.; Yoon, H.; Chong, Y. Current Trends of Artificial Intelligence for Colorectal Cancer Pathology Image Analysis: A Systematic Review. Cancers; 2020; 12, 1884. [DOI: https://dx.doi.org/10.3390/cancers12071884] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32668721]

85. Zhang, S.; Yu, M.; Chen, D.; Li, P.; Tang, B.; Li, J. Role of MRI-based Radiomics in Locally Advanced Rectal Cancer (Review). Oncol. Rep.; 2021; 47, 34. [DOI: https://dx.doi.org/10.3892/or.2021.8245]

86. Alshohoumi, F.; Al-Hamdani, A.; Hedjam, R.; AlAbdulsalam, A.; Al Zaabi, A. A Review of Radiomics in Predicting Therapeutic Response in Colorectal Liver Metastases: From Traditional to Artificial Intelligence Techniques. Healthcare; 2022; 10, 2075. [DOI: https://dx.doi.org/10.3390/healthcare10102075] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36292522]

87. Wang, P.; Deng, C.; Wu, B. Magnetic Resonance Imaging-Based Artificial Intelligence Model in Rectal Cancer. World J. Gastroenterol.; 2021; 27, pp. 2122-2130. [DOI: https://dx.doi.org/10.3748/wjg.v27.i18.2122]

88. Bedrikovetski, S.; Dudi-Venkata, N.; Kroon, H.; Seow, W.; Vather, R.; Carneiro, G.; Moore, J.; Sammour, T. Artificial Intelligence for Pre-Operative Lymph Node Staging in Colorectal Cancer: A Systematic Review and Meta-Analysis. BMC Cancer; 2021; 21, [DOI: https://dx.doi.org/10.1186/s12885-021-08773-w]

89. Viganò, L.; Jayakody Arachchige, V.S.; Fiz, F. Is Precision Medicine for Colorectal Liver Metastases Still a Utopia? New Perspectives by Modern Biomarkers, Radiomics, and Artificial Intelligence. World J. Gastroenterol.; 2022; 28, pp. 608-623. [DOI: https://dx.doi.org/10.3748/wjg.v28.i6.608]

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.