Full Text

Turn on search term navigation

1 Introduction

Large-scale hydrological studies to formulate generalized conclusions on hydrological models and processes require data from large samples of catchments to understand spatiotemporal hydrological differences across scales (Addor et al., 2017; Coxon et al., 2020). Various studies have utilized large-sample datasets to investigate the impacts of climate change and anthropogenic influences on hydrological behavior (Van Loon et al., 2022; Feng et al., 2023) for predictions of hydrometeorological variables (Feng et al., 2020; Kratzert et al., 2018; Lees et al., 2021; Mangukiya et al., 2023), for hydrological classification and similarities (Fang et al., 2022; Dimitriadis et al., 2021; Jehn et al., 2020), for predictions in the ungauged and data-sparse region (Kratzert et al., 2019; Ma et al., 2021; Nearing et al., 2024), and for understanding the drivers of extreme events and future hydrological changes (Mangukiya and Sharma, 2024; Alvarez-Garreton et al., 2021; Zhang et al., 2022; Das et al., 2024). The primary data required for hydrometeorological analyses are streamflow and its drivers, such as precipitation, temperature, solar radiation, evapotranspiration, wind, soil moisture, and relative humidity. Ideally, the hydrometeorological time series datasets are complemented by catchment attributes, which are believed to control hydrological processes such as topography, land cover, soil, and geology (Addor et al., 2017). The availability of such catchment datasets offers a new perspective to the research community, supporting solutions for key issues in water management, quantification and risk assessment of hydrologic extremes, understanding regional-scale hydrologic functioning, and assessing climate change impacts. In addition, it helps the researchers to expedite their research by saving hours of collecting and processing the data from various sources.

The compilation of hydrometeorological time series and complementary attributes for large samples of catchments began in 2006 with the Model Parameter Estimation Experiment (MOPEX) dataset (Schaake et al., 2006) in the USA. Later, the MOPEX dataset was extended by Newman et al. (2015) and Addor et al. (2017), resulting in the first Catchment Attributes and MEteorology for Large-sample Studies (CAMELS) dataset that comprises 671 catchments in the contiguous United States (CONUS). Given the importance of such a large-sample dataset for hydrometeorological studies, the CAMELS and other datasets are developed for various countries, such as Chile (CAMELS-CL; Alvarez-Garreton et al., 2018), North America (HYSETS; Arsenault et al., 2020), Brazil (CAMELS-BR in Chagas et al., 2020; CABra in Almagro et al., 2021), Great Britain (CAMELS-GB; Coxon et al., 2020), China (CCAM; Hao et al., 2021), Australia (CAMELS-AUS; Fowler et al., 2021), Austria (LamaH-CE; Klingler et al., 2021), Switzerland (CAMELS-CH; Höge et al., 2023), Spain (BULL database; Senent-Aparicio et al., 2024), Sweden (CAMELS-SE; Teutschbein, 2024), and Germany (CAMELS-DE; Loritz et al., 2024). Recently, the initiative to combine all existing CAMELS and other large-sample datasets was taken through Caravan (Kratzert et al., 2023) to facilitate global hydrological studies. The community-driven initiative further extended Caravan datasets for Denmark (Koch, 2022) and Israel (Efrat, 2023). Despite the increasing availability of large-sample hydrometeorological datasets globally, India still lacks a comprehensive dataset for large-sample hydrological studies.

In India, accessing analysis-ready datasets is difficult, and the available open-source datasets require additional quality checks (Goteti, 2023). The Central Water Commission (CWC) and various state government agencies provide water-related data through the online portal, India – Water Resources Information System (India-WRIS; https://indiawris.gov.in/wris/, last access: 18 November 2024). However, the related Geographic Information System (GIS) metadata, such as digitized gauge locations, catchment boundaries, and river network information, are still limited, and researchers need to put significant efforts into digitizing and compiling the required information from available CWC reports (Goteti, 2023). For meteorological time series datasets, the India Meteorological Department (IMD) provides a nationwide gridded dataset of rainfall and temperature, and the National Centre for Medium Range Weather Forecasting (NCMRWF) provides various other meteorological variables in a gridded format through the Indian Monsoon Data Assimilation and Analysis (IMDAA) reanalysis data services (https://rds.ncmrwf.gov.in/home, last access: 18 November 2024). However, such nationwide datasets are rarely aggregated to the catchment scale and require pre-processing to make them analyses-ready (Hao et al., 2021). For large-scale hydrological studies, searching for appropriate data, finding methods for data pre-processing, and formatting data consume considerable time and redundant efforts with limited research advances (Beniston et al., 2012; Hao et al., 2021). Due to a lack of analysis-ready datasets and associated difficulties in data processing, unsurprisingly, large-sample hydrological studies are less common in India than in the USA or Europe. To overcome all these difficulties, community-led efforts are required to develop the needed analysis-ready dataset for India (Goteti, 2023).

Goteti (2023) recently provided the first quality-controlled publicly available hydrographic dataset, the Geospatial dataset for hydrologic analyses in India (GHI), which includes GIS data on locations of gauges, catchment boundaries, and river network, and monthly and annual time series of precipitation, evapotranspiration, and runoff for 472 catchments in Peninsular India. Even though the GHI dataset does not systematically provide catchment attributes representing hydroclimatic, land cover, and anthropogenic influences, it paved the way for the hydrologic community by providing reliable GIS metadata for a consistent set of catchments for Indian river basins. To address the data gap of GHI, we produced the CAMELS-IND dataset (Mangukiya et al., 2024), which provides a daily catchment mean time series of 19 meteorological forcing variables; observed streamflow data where available; long short-term memory (LSTM)-based hydrological model-predicted streamflow (Mangukiya et al., 2023); and 211 catchment attributes representing topographic, climatic, hydrologic, land cover, soil, geological, and anthropogenic influence characteristics for 472 catchments in Peninsular India. The proposed dataset will be the stepping stone to provide large-sample meteorological time series and attributes of the Indian catchments to the global and national hydrological community. CAMELS-IND follows the same standards as the previously developed CAMELS datasets to facilitate comparisons with catchments of those countries and inclusion in global hydrological studies. The following sections describe the objectives behind the CAMELS-IND dataset and comprehensively describe all data provided within CAMELS-IND, including its data source and how the hydrometeorological time series and static catchment attributes were prepared.

Figure 1

(a) Major river basins in Peninsular India, defined by the Central Water Commission (CWC), spread across various climate zones. (b) Geography of Peninsular India with major river basins, gauge locations, catchment boundaries, and an elevation map.

[Figure omitted. See PDF]

2 Motivation and rationale

India has hydrologically distinct catchments spread across arid, temperate, and tropical climate zones (Fig. 1a). These catchments are heterogeneous in terms of characterization and are influenced to varying degrees by anthropogenic activities (Mangukiya and Sharma, 2024; Mangukiya et al., 2023). Despite these unique characteristics, Indian catchments are often underutilized in global hydrological studies due to insufficient analysis-ready datasets. The publication of CAMELS-IND aims to address this gap, providing an essential resource for researchers worldwide to investigate hydrological regimes under anthropogenic influences and changing climates, thus tackling water-related issues. CAMELS-IND includes over 100 arid catchments, which can be combined with other arid-zone catchments, such as those in CAMELS, CAMELS-CL, and CAMELS-AUS, enabling large-sample studies of arid-zone hydrology (Fowler et al., 2021). Furthermore, India's catchments are regulated by large and medium dams due to the seasonality of rainfall, often experiencing water limitations on a seasonal basis. This characteristic offers a significant number of samples to the global research community, aiding in addressing various modeling challenges specific to catchments with such unique features.

Given the global use case, during the development of CAMELS-IND, a critical choice was whether to utilize national or global datasets for extracting hydrometeorological time series and catchment attributes. While global datasets would facilitate intercontinental comparisons, national datasets would provide the highest-quality information available in India. So far, CAMELS datasets of different countries have utilized the best possible national data sources, drawing on the expertise of CAMELS creators. In cases where national datasets were unavailable, global datasets, such as the Global Lithological Map (GLiM) (Hartmann and Moosdorf, 2012) and GLobal HYdrogeology MaPS (GLHYMPS) (Gleeson et al., 2014), were used. Using national products would facilitate global users, potentially unfamiliar with such products, to benefit from these local insights (Fowler et al., 2021). It will also encourage national-scale studies by providing analysis-ready datasets from the best available data source within the country. Moreover, ongoing efforts, such as Caravan (Kratzert et al., 2023), to produce consistent global datasets (using global data products for deriving meteorological time series and catchment attributes) will complement the data produced from national sources and facilitate comparative studies. Therefore, we prioritized national data products, where possible, to produce CAMELS-IND.

3 Catchments and hydrological time series

3.1 Catchment description

The CWC and other state government agencies have listed 4824 gauge locations, at present, on India-WRIS for users to obtain streamflow observations. However, out of those, only 645 gauges in Peninsular India offer free access to data for users. The remaining stations either lack data or fall under the “classified data” category due to transboundary river basins. Given the existing challenges in validating and extracting information from available datasets in India, the GHI has introduced the first quality-controlled metadata in GIS format and listed 472 catchments with consistent and verified metadata out of the 645 gauge stations in Peninsular India (Goteti, 2023). The quality control performed by Goteti (2023) addressed essential metadata aspects, including the accuracy of station coordinates, consistency of station descriptions, and verification of delineated catchment area estimates. In the CAMELS-IND dataset, we have incorporated these 472 catchments located in Peninsular India (Fig. 1b) to extract daily meteorological time series and catchment attributes for large-sample hydrological studies.

Peninsular India is a large region situated between the Western Ghats and the Eastern Ghats, extending south of the Vindhya Range (Fig. 1b). The elevation ranges from 0 to 2600 m above mean sea level, with a mean elevation of 600 m, sloping eastward. The Western Ghats, also known as Sahyadri hills, are a prominent landform in this region and play a crucial role in controlling moisture movement during the southwest monsoon. The Palghat (Palakkad) Gap, a narrow region in the Western Ghats, is a geological shear zone representing a weak area in the Earth's crust. This gap supports a network of brooks and creeks forming the west-flowing Bharathapuzha river, which is the second-largest river in Kerala. This gap influences the weather patterns in Peninsular India by allowing moisture-laden southwest monsoon winds to enter the state of Tamil Nadu, moderating the summer temperatures and increasing the rainfall in the region. Other major landforms in Peninsular India include the Eastern Ghats, a discontinuous mountain range along the Bay of Bengal coast. These mountains are eroded and intersected by major rivers of Peninsular India, namely the Mahanadi, Godavari, Krishna, and Cauvery. These rivers create large delta regions east of Eastern Ghats, with nutrient-rich soils (Fig. 1b). The Maikal range in the north is the origin of the Narmada River. Figure 1b illustrates the major river basins and gauge locations in Peninsular India.

Figure 2

Streamflow data availability. (a) Line plot with markers showing the number of catchments with less than 20 % missing data for each hydrological year (starting on 1 June). (b) Data availability (as a percentage of total duration) for each gauge station, indicating that 228 catchments have streamflow data available for over 30 % of the period.

[Figure omitted. See PDF]

3.2 Hydrological time series

Daily-scale streamflow and water level observations for Indian catchments are publicly accessible via the online portal India-WRIS (https://indiawris.gov.in/wris/, last access: 18 November 2024). To ensure data quality, the dataset on India-WRIS undergoes primary validation through the Surface Water Data Entry System (SWDES) and subsequent processing for standardization. The standard operating procedures and data processing protocols are comprehensively described in various user manuals (Lohani, 2012). Users can navigate the portal to select the data source (agencies providing river flow and water level data) and location (such as river basin and gauge name) to download river flow and water level data in an Excel document (.xlsx) format. Currently, India-WRIS imposes a maximum limit of 1 year for each download. To obtain the long-term time series, users must combine data by downloading 1 year at a time. This process can be tedious, but it is necessary to acquire river flow data for Indian catchments. Following this process, we compiled the available streamflow observations from 1 January 1980 to 31 December 2020 from India-WRIS and provided them in the CAMELS-IND dataset. We performed a primary validation of the flow data for negative values and outliers, and no such anomalies were observed in the dataset. Additionally, our preliminary analysis shows that 228 catchments have streamflow data availability for over 30 % of the period between 1980 to 2020 (Fig. 2b). In addition to these 228 catchments, there are 60 catchments with partial streamflow records, ranging from 2.4 % (approximately 1 year) to 30 % (about 12 years) of the period. Although these may not meet the threshold for extensive hydrological analysis, they offer valuable opportunities for pseudo-ungauged studies and model testing or validation in data-limited environments. However, it is worth noting that the India-WRIS portal was launched in July 2019. Since then, continuous efforts have been made to digitize the available data and update the information on the portal. We anticipate that, with time, observations from the rest of the gauges will be made available for users to download. Therefore, we extracted catchment mean meteorological forcings and static attributes for all 472 catchments. To facilitate immediate use by those specifically requiring catchments with streamflow observations, we have also provided a subset of the dataset with the 228 catchments within CAMELS-IND.

4 Meteorological forcings

For large-sample studies, meteorological time series are often extracted from gridded datasets (Fowler et al., 2021). In CAMELS-IND, we extracted daily meteorological time series for 19 variables (listed in Table A1) from a nationwide gridded dataset covering the period from 1 January 1980 to 31 December 2020, spanning 41 years. We used gridded precipitation (0.25° spatial resolution) (Pai et al., 2014) and maximum and minimum temperature (1° spatial resolution) (Srivastava et al., 2009) datasets from the IMD, which has the only available and widely utilized national dataset for India. The precipitation dataset from the IMD is based on rainfall observations from 6995 rain gauge stations across India, providing an accurate representation of the spatial distribution of rainfall (Pai et al., 2014). The temperature dataset from IMD is based on observations from 395 quality-controlled observatories across India. Data from these stations are then converted into a gridded product using Shepard's angular distance weighting method (Srivastava et al., 2009).

Daily time series for the surface downward long-wave and short-wave radiation flux, $u$ and $v$ component of wind (at 10 m), relative humidity (at 2 m), evaporation rates from the soil surface and canopy, and soil moisture at four different layers (0–0.1, 0.1–0.35, 0.35–1, and 1–3 m below ground) were extracted from the IMDAA dataset (Rani et al., 2021). IMDAA data form a regional reanalysis dataset developed by Weather and Climate Modelling under the Ministry of Earth Sciences, India. This dataset, with a resolution of approximately 12 km, is presently the highest-resolution regional reanalysis dataset available for the Indian monsoon region, with increased reliability and accuracy (Rani et al., 2021; Ashrit et al., 2020). As the actual and potential evapotranspiration (AET and PET) dataset over India is not available from national sources, we obtained a daily time series of AET and PET from the Global Land Evaporation Amsterdam Model (GLEAM) (Miralles et al., 2011). We also extracted daily time series of the PET from Singer et al. (2021), which is presently the highest-resolution (0.1°) gridded dataset developed using the ERA5-Land reanalysis dataset, to facilitate comparison. For all meteorological variables, spatially averaged time series for each catchment were calculated using area-weighted averages for each day. The catchment-wise meteorological time series is provided in a folder named catchment_mean_forcings in the CAMELS-IND dataset (Mangukiya et al., 2024).

5 Catchment attributes

In CAMELS-IND, we compiled and calculated 211 catchment attributes representing location and topography, climate, hydrological signatures, land use land cover (LULC), soil and geology, and anthropogenic influences. Table 1 summarizes the file names and descriptions of the attributes provided within the files in the CAMELS-IND dataset. In India, CWC has divided the entire country into 22 basins and provided a unique basin code for identification (India-WRIS, 2012). In CAMELS-IND, we created a five-digit gauge station identifier (the first two digits are the CWC basin code, and the last three digits are station number) and used it as the gauge ID throughout the dataset. For each gauge ID, we provided the station's name as in the CWC database (CWC, 2021) and the name of the river/tributary and basin on which the station is located (Table A2). For ease of use, we also provided the GHI station ID and the GHI-assigned group from the GHI dataset to associate the catchment attributes with the metadata provided in the GHI dataset (Goteti, 2023).

Table 1

Summary of 211 catchment attributes provided in CAMELS-IND.

File name	Attribute description
camels_ind_name	7 attributes (Table A2) representing gauge name and identifier
camels_ind_topo	16 attributes (Table A3) representing location and topography
camels_ind_clim	42 attributes (Table A4) representing climate indices
camels_ind_hydro	73 attributes (Table A5) representing hydrological signatures
camels_ind_land	13 attributes (Table A6) representing land cover characteristics
camels_ind_soil	28 attributes (Table A7) representing soil characteristics
camels_ind_geol	7 attributes (Table A8) representing geological characteristics
camels_ind_anth	25 attributes (Table A9) representing anthropogenic influence in the catchment

Figure 3

Topographic characteristics of catchments in Peninsular India. The histograms depict the frequency distribution of catchments across the bins. (a) Catchment area (in km²), (b) catchment mean elevation (in meters above mean sea level), (c) catchment mean slope (in percentage), and (d) catchment mean drainage path slope (in m km⁻¹).

[Figure omitted. See PDF]

Figure 4

Climate indices for catchments in Peninsular India. The histograms depict the frequency distribution of catchments across the bins. (a) Mean annual precipitation, (b) variation in annual precipitation patterns (higher values indicate more significant variation), (c) frequency of days with precipitation $\geq 5$ times the mean daily precipitation, (d) average number of consecutive days with precipitation $\geq 5$ times the mean daily precipitation, (e) frequency of days with precipitation $< 1$ mm d⁻¹, (f) average number of consecutive days with precipitation $< 1$ mm d⁻¹, (g) maximum number of consecutive days with precipitation $< 1$ mm d⁻¹, and (h) aridity index (P/PET).

[Figure omitted. See PDF]

5.1 Location and topography

The attributes representing the location and catchment area for each gauge ID are compiled from both the CWC and GHI datasets in CAMELS-IND (Table A3). However, it is worth noting that for many gauge stations, the CWC documented spurious gauge locations and catchment areas (Goteti, 2023). Therefore, we preferred the corrected locations provided within the GHI dataset for plotting the gauge locations in this work. For topographic characteristics, elevation and slope are extracted using the 3 arcsec ( $\sim 90$ m) resolution digital elevation model (DEM) of the Shuttle Radar Topography Mission (SRTM) (Farr et al., 2007), as these are the key controlling factors of catchment behavior (Addor et al., 2017). The catchment areas range from 125.7 to 308 433.8 km², with quartile values of 1095.38 km² (first quartile), 3042.2 km² (second quartile), and 11 990.63 km² (third quartile). Figure 3a shows the spatial distribution of the catchment area and highlights that there are 131 catchments with an area greater than 10 000 km². The average elevation becomes less meaningful for such large catchments due to spatial heterogeneity. Moreover, the west-flowing rivers from Tadri to Kanyakumari originate from the steep mountains and meet the Arabian Sea, flowing through plain regions, resulting in lower average elevations and higher slopes (Fig. 3b–c). Therefore, we also computed minimum, maximum, and median catchment elevation and slope using the zonal statistics tool of QGIS for all the gauges to represent the spatial heterogeneity of topographical features in the CAMELS-IND dataset. The catchment mean elevation ranges from 58.04 to 1687.24 m, with quartile values of 361.04 m (first quartile), 470.37 m (second quartile), and 617.9 m (third quartile), while the mean slope ranges from 1.07 % to 32.15 %, with quartile values of 4.11 % (first quartile), 6.23 % (second quartile), and 10.02 % (third quartile). Additionally, the catchment mean drainage path slope is also estimated using SRTM DEM and HydroSHEDS river network (Lehner and Grill, 2013) (Fig. 3d). The mean drainage path slope of the catchments ranges from 1.22 to 74.88 m km⁻¹, with mean and median slopes of 8.93 and 6.35 m km⁻¹, respectively. Overall, the topographic attributes show that the high-altitude catchments with moderate to steep slopes are located in the Western and Eastern Ghats regions, while the catchments in central India have gentler slopes.

5.2 Climate indices

We computed climate indices similar to Addor et al. (2017), which represent both mean and extreme events, using the meteorological time series described in Sect. 4. Additionally, we calculated the monthly and annual precipitation variability, precipitation uniformity, asynchronicity, and the maximum number of consecutive days of extreme event occurrence and their timings (Table A4). To compute aridity, we used the ratio of mean annual precipitation over the PET, following the approach of Addor et al. (2017). Moreover, we also derived the aridity index as the ratio of the deficit between potential and actual evapotranspiration over PET. As an additional reference, we extracted the spatially averaged aridity index from Trabucco and Zomer (2018). The precipitation uniformity indicates how uniformly the annual maximum precipitation is distributed across the days of a year, and it is estimated by relative entropy, a metric proposed by Feng et al. (2013). A zero-precipitation uniformity value indicates that all the days have equal precipitation, whereas a value of 1 indicates that all the annual maximum precipitation occurred in a single day (Dey and Mujumdar, 2019). The asynchronicity index measures the relative magnitude and phase differences between long-term monthly precipitation and potential evapotranspiration (Feng et al., 2019). The frequency of high precipitation days is estimated when the observed precipitation is at least 5 times the mean daily precipitation. The frequency of low precipitation days is calculated when the observed precipitation is less than 1 mm d⁻¹. The average consecutive days of high precipitation are used to estimate the average duration of high precipitation, and the average consecutive dry days are used to estimate the average duration of low precipitation. The timing of high and low precipitation is defined as the season (monsoon – June, July, August, and September; pre-monsoon – January, February, March, April and May; post-monsoon – October, November, and December) during which most of the high and low precipitation events occur.

The spatial distribution of the selected climate indices is shown in Fig. 4. Higher mean annual precipitation ( $> 2500$ mm) is observed in the Western Ghats region, and the precipitation decreases ( $< 1000$ mm) towards the central part of the region (Fig. 4a). The northern and eastern parts of the region exhibit moderate precipitation in the range of 1500 to 2500 mm. The variability in the annual precipitation and frequency of high precipitation days are notably higher (with a coefficient of variation $> 0.8$ and more than 21 d with precipitation $\geq 5$ times mean daily precipitation) in the Mahi, Narmada, Pennar, Sabarmati, and Tapi basins (Fig. 4b–c). We observed high precipitation events mainly concentrated during the monsoon and post-monsoon seasons in the majority and southern parts of the region, respectively. This highlights the dominance of the southwest monsoon (June to September) in the region and the impact of the northeast monsoon in the southeastern part during winter (Das et al., 2022; Das and Jain, 2023). The catchments along the west coast experience prolonged high precipitation in the southwest monsoon season and exhibit evaporation rates of more than 1.2 mm d⁻¹ (Fig. 4d). India has a seasonal precipitation pattern, with most of the precipitation occurring during the southwest monsoon. Consequently, most Indian catchments experience more than 210 dry days in a year (Fig. 4e). Moreover, the Mahi, Narmada, Sabarmati, and Tapi basins, along with the catchments of west-flowing rivers between the Tapi and Tadri basins, show extreme seasonality (Rai and Dimri, 2020), receiving most of the precipitation in 1–2 months, thus resulting in the prolonged dry periods (Fig. 4f–g). The catchments along the southwest coast and eastern sides of Peninsular India are relatively more humid compared to the catchments of Godavari, Krishna, Mahi, Pennar, Sabarmati, and Tapi basins (Fig. 4h). A sharp transition in the aridity index is observed across the Western Ghats – highlighting the increased precipitation on the leeward side and a decrease in the rain shadow region of the Western Ghats. A distinct north–south pattern in the asynchronicity index between long-term precipitation and PET is observed – with a strong out-of-phase relationship in the north and central parts of Peninsular India (Fig. A1). In contrast, an in-phase relationship is observed in the southern part of the region. In CAMELS-IND, we also provide mean indices for temperature, relative humidity, radiation flux, wind speed, and soil moisture to understand the climatic conditions over Peninsular India comprehensively. The spatial patterns of PET indicate moderate values of 1000 to 1250 mm in the central and northern parts, with higher values exceeding 1250 mm in the lower Western Ghats. AET shows a similar trend, with values below 750 mm in the central and northern regions and over 1250 mm in the lower Western Ghats (Fig. A1).

5.3 Hydrological signatures

The hydrological signatures were computed for 228 catchments, with streamflow data available for at least 30 % of the period between 1980 and 2020. The hydrological signatures representing the mean flow and extreme flow events are included in the CAMELS-IND (Table A5), similar to Addor et al. (2017). Additionally, due to seasonal precipitation patterns in India, we also computed seasonal flow and its variability, providing quartiles of flow for the southwest monsoon season. For this purpose, we also included gauges with available streamflow observations during specific seasons with less than 20 % of the values missing for all months. In general, streamflow comprises two components, namely baseflow and quick flow. The baseflow index – the ratio of long-term baseflow to long-term total flow – is estimated using the method described in Ladson et al. (2013) and is available in the TOSSH (Toolbox for Streamflow Signatures in Hydrology) (Gnann et al., 2021). The higher the baseflow index, the more the contribution of the baseflow to the total streamflow. The slope of the flow duration curve (FDC) is used to estimate the variability in streamflow. The slope of the FDC is calculated as the slope of the curve between the log-transformed 33rd and 66th percentiles of daily streamflow over the period of observation (Chouaib et al., 2018; Yokoo and Sivapalan, 2011). A high slope value indicates highly variable streamflow due to pronounced streamflow seasonality or rapid response to precipitation events. Streamflow elasticity quantifies the sensitivity of mean annual precipitation (Sankarasubramanian et al., 2001). A value of streamflow elasticity $m$ indicates that there will be $m$ % change in mean annual streamflow with respect to 1 % change in mean annual precipitation. In addition, runoff ratio – the ratio of long-term mean daily flow to long-term mean daily precipitation – is estimated, which measures the fraction of precipitation that, on average, gets converted to streamflow. The streamflow uniformity, measured using Gini's coefficient (Gudmundsson et al., 2018), ranges from 0 to 1, where 0 indicates a uniform distribution of flows throughout the year, and 1 indicates that all the flows occur on a single day, with values between 0 and 1 representing intermediate cases. Apart from the measures of streamflow variability, attributes quantifying the behavior of extreme streamflow conditions are also quantified. The high-flow and low-flow thresholds during the observation period are computed based on the 95th and 25th percentile of the daily flows. Moreover, we computed approximately 40 indices of hydrological alterations, representing monthly water availability and variability, annual extreme events and their timing, and the frequency and rate of change in flow conditions. The primary limitations with the hydrologic signatures derived are that (1) many attributes can be associated with the size of the catchments and that (2) the causal factors of the extreme flow conditions are not considered.

Figure 5

Streamflow characteristics for catchments in Peninsular India. The histograms depict the frequency distribution of catchments across the bins. (a) Mean daily streamflow, (b) mean daily streamflow of southwest monsoon season, (c) runoff ratio (q_mean/p_mean), (d) variability in daily streamflow, (e) mean consecutive low flow days (flow $< 25$ th percentile daily flow), (f) frequency of low flows in a year, (g) mean consecutive high flow days (flow $> 95$ th percentile daily flow), (h) frequency of high flows, (i) mean of the positive difference between consecutive flow values, (j) mean of the negative difference between consecutive flow values, (k) mean number of hydrologic reversals (i.e., number of peaks in hydrograph), and (l) mean annual flow volume.

[Figure omitted. See PDF]

The mean streamflow pattern closely follows the spatial patterns of the precipitation. The catchments of west-flowing rivers and the Brahmani, Baitarni, Godavari, Mahanadi, Narmada, and Subernarekha basins exhibit higher flows ( $> 1$ mm d⁻¹) throughout the year, including the southwest monsoon season (Fig. 5a–b). A high variation in the streamflow elasticity is observed in the arid regions, and less variation is observed in the humid regions (Fig. A1). However, the sensitivity of streamflow change to precipitation change is more in the arid regions. The streamflow uniformity is higher in the central Eastern Ghats and delta regions and smaller in the Western Ghats region (Fig. A1). High variability (with increased values) in the baseflow index is observed in the southern region, whereas this variability tends to reduce in the central and the northern parts of the region (Fig. A1). In addition, the sensitivity of streamflow to precipitation decreases with increasing baseflow index, highlighting the role of the baseflow in sustaining the flows. The catchments along the southwest coast have a high runoff ratio ( $> 0.5$ ) and relatively low variability in daily flows (Fig. 5c–d). The majority of Indian catchments exhibit low flows for 90 to 120 d during the summer season (March–May), with 30 to 60 consecutive days of low flows (Fig. 5e–f). A similar pattern can be observed for high flows but for a shorter duration, indicating the influence of the dams (Fig. 5g–h). Most dams in India are operated to store water from high precipitation during the southwest monsoon season and gradually release it during summer for irrigation and other water demands. Evidence of this can be seen in Fig. 5i–k, indicating a higher number of hydrological reversals ( $> 100$ in a year) despite seasonal precipitation patterns.

5.4 Land cover characteristics

Land cover attributes were extracted from the Sentinel-2 10 m LULC time series (Karra et al., 2021) providing eight LULC classes, including water, trees, flooded vegetation, crops, built-up areas, bare ground, snow cover, and rangeland. Due to the absence of snow cover in Peninsular Indian catchments, we excluded it, and the temporal average of the seven remaining LULC classes was extracted as fractions of the catchment using the zonal histogram tool in QGIS (Table A6). Additionally, the spatiotemporally averaged (from 2001 to 2020) minimum and maximum leaf area index (LAI) of the catchment was extracted from MCD15A2H MODIS/Terra $+$ Aqua leaf area index/FPAR (fraction of absorbed photosynthetically active radiation) 8 d L4 Global 500 m SIN Grid V006 (Myneni et al., 2015) to represent the vertical density of vegetation. The maximum LAI will help set the boundary conditions of evaporation rates from the canopy and vegetation interception, while the difference between maximum and minimum LAI will represent the seasonal variation in LAI (Addor et al., 2017). The spatial variation in different LULC classes indicates that catchments of west-flowing rivers from Tadri to Kanyakumari, east-flowing rivers between Pennar and Kanyakumari, and the Cauvery basin have higher ( $> 8$ %) urban areas. In comparison, more than 50 % of the catchment areas of the Godavari, Krishna, Mahi, Narmada, and Tapi basins are covered with agricultural land (Fig. 6a–b). To meet agricultural water demands in these catchments, several large and medium reservoirs and lakes are present in this region, covering more than 2 % of the catchment area (Fig. 6c). The catchments along the southwest coast are mainly covered with trees and exhibit lower seasonal variability in the LAI (Fig. 6d–f). On the other hand, catchments in central India show high seasonal variability in the LAI, primarily due to seasonal crops, as these catchments have a very low fraction of forest cover (Fig. 6d–f).

Figure 6

LULC characteristics for catchments in Peninsular India. The histograms depict the frequency distribution of catchments across the bins. (a–d) Fraction of built-up area, cropland, water and flooded vegetation, and tree cover, respectively. (e) Maximum leaf area index. (f) Difference between the maximum and minimum leaf area index.

[Figure omitted. See PDF]

5.5 Soil and geological characteristics

The attributes related to soil characteristics of the catchment were derived from global data sources (Table A7), as national datasets related to soil characteristics are either not freely available or not in a digitized form at present. The average soil depth of the catchments was extracted from Pelletier et al. (2016), which provides the thickness of soil and sediment deposits with a 30 arcsec resolution. The saturated hydraulic conductivity, available water storage, and fraction of organic matter content for both topsoil (0–30 cm) and subsoil (30–200 cm) were extracted from HiHydroSoil v2 at 250 m resolution (Simons et al., 2020). The available water storage capacity of the soil was extracted from Food and Agriculture Organization (FAO) soil data (Fischer et al., 2008). The fraction of sand, silt, clay, gravel, the bulk density of soil, and the organic carbon content in soil for topsoil (0–30 cm) and subsoil (30–100 cm) were extracted from the Harmonized World Soil Database v2.0 (FAO and IISA, 2023). The catchment mean annual average water table depth was extracted from Fan et al. (2013). Additionally, we also extracted the major hydrologic soil group (HSG) from the HiHydroSoil v2 (Simons et al., 2020). The HSG helps derive the runoff curve number utilized in hydrological modeling for direct runoff estimation. All the attributes related to soil characteristics were computed using the zonal statistics tool of QGIS.

The spatial variability in soil attributes shows that the catchments of the Mahanadi and lower Godavari basins have a high fraction of sand, while catchments along the southwest coast in the Brahmani and Baitarni, Sabarmati, and Subernarekha basins have a high fraction of silt and gravel (Fig. 7a–d). The catchments of the Krishna, Narmada, Tapi, and upper Godavari basins have a high fraction of clay in the soil. Catchments of west-flowing rivers from Tapi to Tadri have more than 4 % organic carbon content (Fig. 7e). Out of 472 catchments, 320 catchments in India have a soil depth up to 2 m, while 87 catchments, mainly located along the east coast and lower Godavari, lower Narmada, and Sabarmati basins, exhibit soil depth of more than 3 m (Fig. 7f). The catchments located in the upper part of Peninsular India, mainly in the Godavari, Mahanadi, Mahi, Narmada, Sabarmati, and Tapi basins, have soil with a low available water storage capacity compared to the catchments along the southwest coast (Fig. 7g). The spatial variability in the soil conductivity shows that catchments of lower Krishna, Mahi, Pennar, and Sabarmati basins have high soil conductivity ( $> 5$ cm d⁻¹) (Fig. 7h), and catchments of Peninsular India have moderate-to-high runoff potential.

Figure 7

Soil characteristics for catchments in Peninsular India. The histograms depict the frequency distribution of catchments across the bins. (a–e) Fraction of sand, silt, clay, gravel, and organic matter content in topsoil (0–30 cm), respectively. (f) Average thickness of soil and sedimentary deposits. (g) Available water storage capacity of topsoil (0–30 cm). (h) Mean saturated hydraulic conductivity of topsoil (0–30 cm).

[Figure omitted. See PDF]

The geological attributes (Table A8) were computed following Addor et al. (2017). The first and second most common geological classes, their respective proportions within the catchment, and the fraction of “carbonate sedimentary rocks” were extracted from the Global Lithological Map (GLiM) (Hartmann and Moosdorf, 2012) using the zonal statistics and zonal histogram tools of QGIS. The mean subsurface porosity and permeability of the catchment were derived from the GLobal HYdrogeology MaPS (GLHYMPS) (Gleeson et al., 2014). The spatial variability in subsurface porosity and permeability indicates that catchments in the Narmada and Sabarmati basins have relatively high porosity ( $> 0.1$ ), while those in the Mahi, Narmada, Tapi, upper Godavari, and upper Krishna basins exhibit high permeability ( $> 0.73$ m²) (Fig. 8a–b). The southern parts of the Peninsular region consist of the hard-rock aquifer system with low porosity and permeability. The Peninsular region is the oldest and largest geomorphic province of India. There are seven dominant geological classes identified in the Peninsular region, namely basic volcanic, metamorphic, acid plutonic, siliciclastic sedimentary rocks, mixed sedimentary rocks, carbonate sedimentary rocks, and subordinate unconsolidated sediments. Out of 472 catchments, 179 have basic volcanic rocks and 176 have metamorphic rocks as the most common geological classes, with the majority of them having only a single geological class for the entire catchment (Fig. 8c–g). The rock types that are classified under basic volcanic rocks are basalts, tephrites, tholeiites, and lamprophyres (Hartmann and Moosdorf, 2012). The metamorphic class constitutes a variety of rocks from shales to gneiss and from amphibolite to quartzite. The groundwater movement in these two dominant classes is controlled by rock fractures and their continuities, depth of weathering, topography, nature, and size of recharge and discharge areas.

Figure 8

Geological characteristics for catchments in Peninsular India. The histograms depict the frequency distribution of catchments across the bins. (a, b) Mean subsurface porosity and permeability. (c, d) Most common geological classes. (e, f) Fraction of the catchment area associated with most common geological classes. (g) Fraction of the catchment area characterized as carbonated sedimentary rocks.

[Figure omitted. See PDF]

5.6 Anthropogenic influences

Catchments in India have varied degrees of anthropogenic influence. Due to seasonal rainfall patterns, water demands in the region are primarily met by several dams. In CAMELS-IND, the degree of human intervention within the catchments is quantified through the information on the number of dams; year of construction of the first and the recent dam; total cumulative storage of dams; and fraction of these storages used in hydropower generation, flood control, irrigation, drinking, flood storage, and navigation (Table A9). In addition, the reservoir index – a ratio of total storage volume to multiyear annual streamflow – is also estimated. The attributes of the number of dams in each catchment and their cumulative storage volume were extracted and digitized from India-WRIS and the Global Reservoir and Dam Database (GRanD) (Lehner et al., 2011). The water stored in these dams is mainly used for urban and agricultural purposes. Since quantitative measurements of water demands are unavailable, we included decadal population density data (WorldPop and CIESIN, 2018) and the fraction of urban areas and cropland (Roy et al., 2015) as indirect measures in CAMELS-IND.

Figure 9

Attributes representing anthropogenic influences for catchments in Peninsular India. The histograms depict the frequency distribution of catchments across the bins. (a) Total number of large and medium dams. (b) Total volume content of dams. (c, d) Decadal fraction of urban land cover in 1985 and 2005, respectively. (e) Mean population density. (f, g) Decadal fraction of cropland in 1985 and 2005, respectively. (h) Reservoir index.

[Figure omitted. See PDF]

The spatial distribution of large and medium dams (with heights $>$ 10 m) across catchments shows significant regulation in the catchments of the Cauvery, Godavari, Krishna, Mahanadi, Mahi, Narmada, and Tapi basins (Fig. 9a–b). The number of dams within the catchment ranges from 0 to 1277, with a mean and median of 75 and 9, respectively. The total storage capacity ranges from 0 to 59 929 Mm³, with a mean and median storage capacity of 3796 and 246 Mm³, respectively. The decadal variation in the urban land cover and population density reveals a notable increase in urbanization within the catchments of southern India from 1985 to 2015 (Fig. 9c–e). Conversely, the fraction of agricultural land remains relatively constant over the same period (Fig. 9f–g). The reservoir index, indicating the impact of dams on streamflow, is higher in the Godavari, Krishna, and Cauvery basins, whereas most of the catchments in Narmada basin and Western Ghats region have lower values of reservoir index (Fig. 9h). It is observed that the majority of the dams in this region served irrigation purposes, whereas the dams in the lower Western Ghats of the Peninsular region are mainly used for hydroelectric power generation (Fig. A1).

6 Regionally trained LSTM-based hydrological model for streamflow prediction

We used a long short-term memory (LSTM)-based regional hydrological model applied to Indian catchments by Mangukiya et al. (2023) to predict the daily streamflow for all 472 catchments. The LSTM model architecture includes an input gate, output gate, forget gate, and a memory cell, which enables the model to learn long-term dependencies within the input datasets (Hochreiter and Schmidhuber, 1997). We trained the LSTM-based regional hydrological model using daily meteorological time series and catchment attributes as input and predicted daily streamflow (Mangukiya et al., 2023; Mangukiya and Sharma, 2024). The input data included daily meteorological time series of precipitation, maximum and minimum temperature, solar radiation, wind speed, and relative humidity, along with catchment attributes representing topographic, land cover, soil, and geological characteristics. The LSTM model was trained using a dataset from 159 catchments, ensuring that a minimum data length of 28 years was available for each catchment between 1980 and 2020. The optimized hyperparameter values for the LSTM model were adopted from Mangukiya et al. (2023). The model was trained from 1 January 1991 to 31 December 2015, validated from 1 January 1980 to 31 December 1990, and tested from 1 January 2016 to 31 December 2020. In addition to 159 catchments, we tested the LSTM model's generalization capability to make streamflow predictions in 17 pseudo-ungauged catchments, which were held out during training.

The results indicate satisfactory model performance, with a median Nash–Sutcliffe efficiency (NSE) of 0.59 and 0.57 during the test and validation periods, respectively (Fig. 10a). Notably, the LSTM model achieved a median correlation of 0.8, percentage bias of $- 7.64$ , Kling–Gupta efficiency (KGE) of 0.62, root mean squared error (RMSE) of 121.5 m³ s⁻¹, low flow (bottom 30 % of flow, FLV) bias of $- 1.4$ %, and high flow (top 2 % of flow, FHV) bias of $- 15.72$ % (Fig. 10b). Additionally, we calculated the average RMSE of the observed and predicted flow duration curve (fdcRMSE) as an additional evaluation metric. The model achieved a median fdcRMSE of 127.09 m³ s⁻¹. However, we observed that the LSTM model performed poorly in challenging catchments, such as those with a high number of dams, non-perennial catchments, and catchments in arid and semi-arid climate zones. More details on the model's performance and limitations can be found in Mangukiya et al. (2023). The LSTM-based regional hydrological model, trained on 159 catchments, was used to predict daily streamflow for all 472 catchments from 1 January 1980 to 31 December 2020. Within CAMELS-IND, gauge-wise predicted streamflow series are provided in a compressed zip file named LSTM_pred_streamflow.zip. This predicted streamflow series is included in the dataset to support deep-learning- or machine-learning-based hydrology research and can be used as a benchmark or baseline model for developing and testing hydrological models.

Figure 10

LSTM model performance. (a) Spatial distribution of NSE. (b) Performance metrics.

[Figure omitted. See PDF]

7 Preliminary assessment of dataset quality and uncertainty

The preliminary assessment presented here focuses only on the catchment mean meteorological time series provided in CAMELS-IND. The precipitation and maximum and minimum temperature time series were extracted from the IMD dataset. Goteti (2023) has provided a detailed comparison of the annual precipitation series of the catchment extracted from IMD and ECMWF Reanalysis (ERA). The results indicated a Pearson correlation coefficient greater than 0.75 for 31 catchments and between 0.5 to 0.75 for 331 catchments out of 472. The lower correlation was found only in the hilly terrain of the southwestern part of Peninsular India. Moreover, Mahto and Mishra (2019) also observed a general consistency between the ERA and IMD datasets. We extracted the meteorological time series of solar radiation, wind speed, relative humidity, and soil moisture from the IMDAA data. Figure 11a shows the Pearson correlation of catchment mean annual time series for long-wave and short-wave solar radiation, as well as wind speed, extracted from IMDAA and those derived from the Global Land Data Assimilation System (GLDAS) Noah Land Surface Model L4 3-hourly $0.25 \times 0.25$ v2.0 for period of 1980–2014 (Beaudoing and Rodell, 2019) and v2.1 for period 2014–2020 (Beaudoing and Rodell, 2020). For the majority of catchments, the long-wave solar radiation extracted from the IMDAA dataset shows consistency with that of the GLDAS dataset, indicated by a high correlation coefficient. However, a lower correlation was observed for short-wave solar radiation, particularly in the hilly terrain of the Mysore plateau and southern catchments. Similarly, wind speed also exhibited discrepancies in a few catchments in the southern region and in the catchments of the upper Eastern Ghats, lower Godavari, Mahanadi, Brahmani, and Baitarni basins. These discrepancies between the data sources could be attributed to the different boundary conditions and forcings that are used to simulate the climate models (Rodell et al., 2004; Rani et al., 2021). While global reanalysis products provide a convenient data source, their relatively coarse resolution (e.g., 25 km grid spacing) limits their ability to accurately capture climate variations in mountainous regions. In contrast, IMDAA, with its 10 km resolution, provides a more detailed representation of such variations. As demonstrated by Nayak et al. (2018), reanalysis products derived using Indian-specific boundary conditions and land use data showed better performance in capturing meteorological patterns in hilly areas compared with GLDAS.

To further evaluate the quality of the meteorological time series provided within the CAMELS-IND, we used it as input to the LSTM-based regional hydrological model (described in Sect. 6) and compared the model's performance with that of GLDAS meteorological time series as input for 200 catchments with continuous streamflow observations from 1991 to 2015. The model was trained on data from 1991 to 2011 and tested on data from 2011 to 2015. The results indicate superior model performance when using IMDAA forcings as input compared to GLDAS forcings (Fig. 11b). For the majority of the catchments (165 out of 200), the model performed better with IMDAA forcings. Minor improvements (with a NSE difference of $\approx 0.02$ ) were observed in 13 catchments, while performance significantly deteriorated in 22 catchments with GLDAS forcings. Notably, the simulated streamflow based on IMDAA forcings outperformed that based on GLDAS forcings, with a median percentage bias of $- 11.74$ %, low flow bias (FLV) of $- 19.48$ %, and high flow bias (FHV) of $- 18.58$ % compared to $- 20.39$ %, $- 22.5$ %, and $- 25.22$ %, respectively (Fig. 11c). Overall, the preliminary assessment of the dataset suggests that the meteorological time series extracted from the IMD and IMDAA have the best available national data sources for Indian region, providing reliable model performance compared to global data sources.

The preliminary results clearly demonstrated the CAMELS-IND dataset's potential to significantly enhance the performance of hydrological applications. However, it is crucial to acknowledge that the dataset is not without its limitations. Several factors, including data collection methods, processing techniques, and measurement errors, can introduce uncertainties into the dataset. For instance, the use of diverse instruments and methodologies over time can lead to inconsistencies in measurements, particularly for variables like rainfall and streamflow. While the dataset provides catchment-averaged indices and series, the spatial resolution disparities between satellite, ground-based, and re-analysis products have a relatively limited impact on overall data quality. Nonetheless, gaps in data coverage and the presence of spurious values can further exacerbate the uncertainty. A detailed assessment and quantification of uncertainty is beyond the scope of this paper and will be addressed in future versions of the dataset when ground-based observations become available for public use.

Figure 11

(a) Pearson correlation coefficient between catchment mean annual time series from IMDAA and GLDAS. (b) LSTM model performance with IMDAA and GLDAS as inputs. (c) Calculated performance metrics of the LSTM model with IMDAA and GLDAS forcing inputs.

[Figure omitted. See PDF]

8 Possible future extensions

The CAMELS-IND dataset currently provides hydrometeorological time series and catchment attributes for only 472 catchments in Peninsular India due to the availability of openly accessible and quality-controlled datasets. However, India has 4824 catchments at present, many of which are restricted due to their location in transboundary river basins. While the meteorological data are available for the entire country, the primary constraint is the availability of consistent streamflow observations. In future versions, we aim to address this limitation by applying the conceptual or physics-based and regionally trained LSTM-based hydrological model to other locations, thereby providing simulated streamflow series for catchments currently classified as restricted. Additionally, we intended to leverage the streamflow series from other sources, such as GloFAS (Global Flood Awareness System; Harrigan et al., 2020) and satellite altimetry (Verma et al., 2021; Rai et al., 2021) to improve the spatial coverage of the dataset. To ensure accuracy, these streamflow time series will be validated against ground-based measurements at selected stations in restricted regions.

Groundwater is a vital component of understanding hydrological extremes such as floods (Sharma and Mujumdar, 2024) and droughts (Hellwig et al., 2020). However, the current CAMELS-IND dataset is limited by the absence of groundwater data. To address this limitation, we aim to incorporate the groundwater level data available at India-WRIS Portal (https://indiawris.gov.in/wris/, last access: 18 November 2024) and derived groundwater level data from Gravity Recovery and Climate Experiment (GRACE) (Li et al., 2019; Moudgil and Rao, 2023; Gautam et al., 2024). The inclusion of groundwater data will significantly improve the dataset's ability to capture the complex interactions between surface and sub-surface systems, thereby enhancing our understanding of hydrological processes and extreme events.

The catchment boundaries included in the CAMELS-IND dataset are derived from Goteti (2023), who highlighted that the catchment delineation based on the 500 m HydroSHEDS v1 dataset may suffice for larger catchments but could introduce minor mismatches, particularly in smaller catchments and around flow outlets. The upcoming HydroSHEDS v2, with 12 m topographic data, is anticipated to improve spatial accuracy, enhancing the delineation of catchment boundaries and river networks for future dataset versions.

9 Data availability

The CAMELS-IND dataset is freely available at 10.5281/zenodo.14005378 (Mangukiya et al., 2024), which includes (1) the file 00_CAMELS_IND_Data_Description.pdf, which provides a description of data sources, references, and the file structure of the dataset in the repository; (2) CAMELS_IND_All_Catchments.zip, containing hydrometeorological time series, catchment attributes, and GIS shapefiles for all 472 catchments, as well as a filter_catchment.py script to help users create a subset of the dataset based on specific requirements; and (3) CAMELS_IND_Catchments_Streamflow_Sufficient.zip, a subset of data for the 228 catchments with observed streamflow for at least 30 % of the period between 1980 and 2020.

10 Concluding remarks

India has hydrologically distinct catchments, each with unique characteristics. However, Indian catchments are often underutilized in global hydrological studies due to insufficient analysis-ready datasets. To address this gap, we introduce CAMELS-IND (Catchment Attributes and MEteorology for Large-sample Studies – India), which provides catchment mean time series of meteorological variables and around 211 catchment attributes representing location and topography, climate, hydrological signatures, land use land cover (LULC), soil and geology, and anthropogenic influences for 472 catchments in Peninsular India. Such a dataset is essential for understanding hydrologic processes over multiple spatiotemporal scales and various other applications for planning and regulating water resources in India. The CAMELS-IND follows the same standards of the previously developed CAMELS datasets for the USA, Chile, Brazil, Great Britain, Australia, Switzerland, and Germany to facilitate comparisons with catchments of those countries and inclusion in global hydrological studies.

CAMELS-IND serves as a stepping stone to provide large-sample hydrometeorological time series and attributes of the Indian catchments to the global and national hydrological community, and we plan to update and expand the dataset with additional catchment attributes and meteorological forcings as new national data sources become available. For example, future versions of CAMELS-IND could include additional catchment attributes to better characterize heterogeneity and regulations within each catchment. Additionally, since data uncertainties are inherent, future studies will explore this through comparisons with additional data sources.

The creation of CAMELS-IND aims to foster large-sample hydrological studies in India and promote the inclusion of Indian catchments in global hydrological research. Furthermore, it will enhance the reproducibility and transparency of hydrological studies in India by providing a standardized dataset.

Appendix A

Figure A1

(a) Evaporation from canopy and soil surface; (b) mean annual actual evapotranspiration; (c) mean annual potential evapotranspiration; (d) asynchronicity; (e) precipitation uniformity; (f) streamflow elasticity; (g) streamflow uniformity (Gini coefficient); (h) baseflow index; and (i–k) fraction of reservoir storage uses for irrigation, hydroelectric power, and flood control.

[Figure omitted. See PDF]

Table A1

Summary of streamflow and catchment mean meteorological time series for a period from 1 January 1980 to 31 December 2020.

Time series class	Variable name	Description	Unit	Data source/reference
Hydrological time series	streamflow_observed	available observed streamflow time series	m³ s⁻¹	India-WRIS
	LSTM_pred_streamflow	regionally trained LSTM-based hydrological model-predicted streamflow time series	m³ s⁻¹	Mangukiya et al. (2023)
Meteorological time series	prcp	precipitation	mm d⁻¹	IMD (Pai et al., 2014)
	$t_{\max}$	maximum temperature	°C	Srivastava et al. (2009)
	$t_{\min}$	minimum temperature	°C
	tavg	averaged temperature	°C	$(t_{\max} + t_{\min}) / 2$
	srad_lw	surface downward long-wave radiation flux	W m⁻²	IMDAA (Rani et al., 2021)
	srad_sw	surface downward short-wave radiation flux	W m⁻²
	wind_u	$U$ component of wind (10 m)	m s⁻¹
	wind_v	$V$ component of wind (10 m)	m s⁻¹
	wind	averaged wind speed (10 m)	m s⁻¹	$\sqrt{(wind_u^{2} + wind_v^{2})}$
	rel_hum	relative humidity (2 m)	%	IMDAA (Rani et al., 2021)
	evap_canopy	evaporation rate from canopy	mm d⁻¹
	evap_surface	evaporation rate from the soil surface	mm d⁻¹
	pet	potential evapotranspiration (1981–2020)	mm d⁻¹	Singer et al. (2021)
	pet_gleam	potential evapotranspiration	mm d⁻¹	GLEAM (Miralles et al., 2011)
	aet_gleam	actual evapotranspiration	mm d⁻¹
	sm_lvl1	soil moisture of layer 1 (0–0.1 m below ground)	kg m⁻²	IMDAA (Rani et al., 2021)
	sm_lvl2	soil moisture of layer 2 (0.1–0.35 m below ground)	kg m⁻²
	sm_lvl3	soil moisture of layer 3 (0.35–1 m below ground)	kg m⁻²
	sm_lvl4	soil moisture of layer 4 (1–3 m below ground)	kg m⁻²

Table A2

Summary of catchment attributes representing gauge names and identifiers.

Attribute	Description	Unit	Data source/reference
gauge_id	gauge station identifier (five digits; the first two digits are from the CWC basin code, and the last three digits are the station number)	–
ghi_stn_id	unique ID used to identify a station, 10 characters long	–	GHI (Goteti, 2023)
cwc_site_name	name of the station	–	CWC
river_basin	name of the river basin	–	CWC
cwc_river	river/tributary	–	CWC
ghi_group	GHI assigned group (G1 or G2)	–	GHI (Goteti, 2023)
flow_availability	percentage duration for which streamflow data are available between 1980–2020	%	CWC

Table A3

Summary of catchment attributes representing location and topography.

Attribute	Description	Unit	Data source/reference
cwc_lat	latitude of the station	decimal degrees	CWC
cwc_lon	longitude of the station	decimal degrees	CWC
ghi_lat	latitude of the GHI-relocated station	decimal degrees	GHI (Goteti, 2023)
ghi_lon	longitude of the GHI-relocated station	decimal degrees	GHI (Goteti, 2023)
elev_mean	catchment mean elevation	m	SRTM DEM 90 m
elev_median	catchment median elevation	m	SRTM DEM 90 m
elev_min	catchment min elevation	m	SRTM DEM 90 m
elev_max	catchment max elevation	m	SRTM DEM 90 m
slope_mean	catchment mean slope	%	SRTM DEM 90 m
catchment median slope	%	SRTM DEM 90 m
slope_min	catchment min slope	%	SRTM DEM 90 m
slope_max	catchment max slope	%	SRTM DEM 90 m
cwc_area	catchment drainage area	km²	CWC
ghi_area	catchment drainage area	km²	GHI (Goteti, 2023)
gauge_elevation	elevation of the gauging station	m	SRTM DEM 90 m
dpsbar	catchment mean drainage path slope	m km⁻¹	SRTM DEM 90 m

Table A4

Summary of catchment attributes representing climate indices computed from 1 January 1980 to 31 December 2020.

Attributes	Description	Unit	Data Source/reference
p_mean	mean daily precipitation	mm d⁻¹	IMD
p_max	maximum daily precipitation	mm d⁻¹	IMD
p_mean_anum	annual average total precipitation	mm	IMD
p_monthly_variability	variation in precipitation patterns throughout the year (higher values indicate greater variation)	–	IMD
p_annual_variability	variation in annual precipitation patterns (higher values indicate greater variation)	–	IMD
p_unif	how uniformly the precipitation is distributed in a year; 0 if the annual maximum precipitation is uniformly distributed throughout the year; 1 if the annual maximum precipitation occurs in a single day	–	IMD
high_prec_freq	frequency of high precipitation days ( $\geq 5$ times the mean daily precipitation)	d yr⁻¹	IMD
high_prec_dur	average duration of high precipitation events (number of consecutive days $\geq 5$ times the mean daily precipitation)	days	IMD
max_high_prec_dur	maximum number of consecutive days with precipitation $\geq 5$ times the mean daily precipitation	days	IMD
high_prec_timing	season during which most high precipitation days ( $\geq 5$ times the mean daily precipitation) occur	season	IMD
low_prec_freq	frequency of dry days (precipitation $< 1$ mm d⁻¹)	d yr⁻¹	IMD
low_prec_dur	average duration of dry periods (number of consecutive days $< 1$ mm d⁻¹)	days	IMD
max_low_prec_dur	maximum number of consecutive days with precipitation $< 1$ mm d⁻¹	days	IMD
low_prec_timing	Season during which most dry days ( $< 1$ mm d⁻¹) occur	season	IMD
asynchronicity	asynchronicity between the annual precipitation and PET cycles, where high values represent high relative magnitude and phase differences	–	Feng et al. (2019)
tmin_mean	mean daily minimum temperature	°C	IMD
tmax_mean	mean daily maximum temperature	°C	IMD
pet_mean	mean daily potential evapotranspiration	mm d⁻¹	Singer et al. (2021)
pet_min	minimum daily potential evapotranspiration	mm d⁻¹	Singer et al. (2021)
pet_max	maximum daily potential evapotranspiration	mm d⁻¹	Singer et al. (2021)
pet_mean_anum	annual average total potential evapotranspiration	mm	Singer et al. (2021)
pet_gleam_mean	mean daily average potential evapotranspiration	mm d⁻¹	Miralles et al. (2011)
aet_gleam_mean	mean daily average actual evapotranspiration	mm d⁻¹	Miralles et al. (2011)
evap_canopy_mean	mean daily evaporation rate from the canopy	mm d⁻¹	IMDAA
evap_canopy_min	minimum daily evaporation rate from the canopy	mm d⁻¹	IMDAA
evap_canopy_max	maximum daily evaporation rate from the canopy	mm d⁻¹	IMDAA
evap_canopy_anum	annual average total evaporation from the canopy	mm	IMDAA
evap_surface_mean	mean daily evaporation rate from the soil surface	mm d⁻¹	IMDAA
evap_surface_min	minimum daily evaporation rate from the soil surface	mm d⁻¹	IMDAA
evap_surface_max	maximum daily evaporation rate from the soil surface	mm d⁻¹	IMDAA
evap_surface_anum	annual average total evaporation from the soil surface	mm	IMDAA
aridity_p_pet	aridity index (P/PET; ratio of mean annual precipitation over the mean annual potential evapotranspiration)	–
aridity_pet_aet	aridity index [(PET-AET)/PET; a ratio of the deficit between potential and actual evapotranspiration over potential evapotranspiration]	–
ai_mean	spatially averaged aridity index of the catchment	–	Trabucco and Zomer (2018)
rel_hum_mean	mean daily relative humidity (2 m)	%	IMDAA
srad_lw_mean	mean daily surface downward long-wave radiation flux	W m⁻²	IMDAA
srad_sw_mean	mean daily surface downward short-wave radiation flux	W m⁻²	IMDAA
wind_mean	mean daily wind speed (10 m)	m s⁻¹	IMDAA
sm_lvl1_mean	mean daily soil moisture in layer 1 (0–0.1 m below ground)	kg m⁻²	IMDAA
sm_lvl2_mean	mean daily soil moisture in layer 2 (0.1–0.35 m below ground)	kg m⁻²	IMDAA
sm_lvl3_mean	mean daily soil moisture in layer 3 (0.35–1 m below ground)	kg m⁻²	IMDAA
sm_lvl4_mean	mean daily soil moisture in layer 4 (1–3 m below ground)	kg m⁻²	IMDAA

Table A5

Summary of catchment attributes representing hydrological signatures computed from 1 January 1980 to 31 December 2015.

Attributes	Description	Unit	Data Source/reference
q_mean	mean daily streamflow of the catchment	mm d⁻¹	India-WRIS
runoff_ratio	runoff ratio (ratio of mean daily streamflow to the mean daily precipitation of catchment)	–
streamflow_elas	streamflow precipitation elasticity (i.e., the sensitivity of streamflow to changes in precipitation at the annual timescale, using the mean daily discharge as reference)	–	Eq. (7) in Sankarasubramanian et al. (2001)
slope_fdc	slope of the flow duration curve between the log-transformed 33rd and 66th streamflow percentiles	–	Addor et al. (2017)
bfi	baseflow index, computed as the ratio of mean daily baseflow to mean daily discharge, with the hydrograph separation performed using the Ladson et al. (2013) digital filter	–
q_cv	variability in daily streamflow values (coefficient of variation)	%	India-WRIS
q_10	first decile of mean daily streamflow (the value below which 10 % of the observations fall)	mm d⁻¹	India-WRIS
q_25	first quartile of mean daily streamflow (the value below which 25 % of the observations fall)	mm d⁻¹	India-WRIS
q_50	median of mean daily streamflow (the value below which 50 % of the observations fall)	mm d⁻¹	India-WRIS
q_75	third quartile of mean daily streamflow (the value below which 75 % of the observations fall)	mm d⁻¹	India-WRIS
q_90	90th percentile of mean daily streamflow (the value below which 90 % of the observations fall; high flows)	mm d⁻¹	India-WRIS
q_zero	frequency of days with zero flow	d yr⁻¹	India-WRIS
q_low_days	mean number of consecutive days with flow less than 25th percentile mean daily flow	days	India-WRIS
freq_q_low	frequency of days with low flows (flow less than 25th percentile mean daily flow)	d yr⁻¹	India-WRIS
q_high_days	mean number of consecutive days with a flow more than the 95th percentile mean daily flow	days	India-WRIS
freq_q_high	frequency of days with high flows (flow more than 95th percentile mean daily flow)	d yr⁻¹	India-WRIS
annual_q	mean annual flow volume in the catchment	MCM/year	India-WRIS
mean_anum_flow	mean annual flow volume in the catchment (computed for 1950 to 2020)	MCM yr⁻¹	GHI (Goteti, 2023)
cen_time	center timing, corresponding to the day of the year (doy) at which 50 % of annual flow is reached	Day
gini_flow	uniformity of flow over the days in a year; 0 indicates equal flow throughout the year; 1 indicates all flow occurred in a single day	–

Table A5

Continued.

Attributes	Description	Unit	Data Source/Reference
annual_max_1day	mean annual 1 d maximum flow	m³ s⁻¹	India-WRIS
annual_max_3day	mean annual 3 d maximum flow	m³ s⁻¹	India-WRIS
annual_max_7day	mean annual 7 d maximum flow	m³ s⁻¹	India-WRIS
annual_max_30day	mean annual 30 d maximum flow	m³ s⁻¹	India-WRIS
annual_max_90day	mean annual 90 d maximum flow	m³ s⁻¹	India-WRIS
annual_min_7day	mean annual 7 d minimum flow	m³ s⁻¹	India-WRIS
month_1day_max	month of 1 d maximum flow for the majority of the years	calendar month	India-WRIS
month_1day_min	month of 1 d minimum flow for the majority of the years	calendar month	India-WRIS
doy_min_flow	the day of the year (doy) at which minimum streamflow occurred	Day
doy_max_flow	the day of the year (doy) at which maximum streamflow occurred	Day
doy_min_flow_7	the day of the year (doy) at which minimum 7 d streamflow occurred	Day
doy_max_flow_7	the day of the year (doy) at which maximum 7 d streamflow occurred	Day
mean_jan_flow	mean monthly flow volume of January in the catchment (computed for 1950 to 2020)	MCM/month	GHI (Goteti, 2023)
mean_feb_flow	mean monthly flow volume of February in the catchment (computed for 1950 to 2020)	MCM/month	GHI (Goteti, 2023)
mean_mar_flow	mean monthly flow volume of March in the catchment (computed for 1950 to 2020)	MCM/month	GHI (Goteti, 2023)
mean_apr_flow	mean monthly flow volume of April in the catchment (computed for 1950 to 2020)	MCM/month	GHI (Goteti, 2023)
mean_may_flow	mean monthly flow volume of May in the catchment (computed for 1950 to 2020)	MCM/month	GHI (Goteti, 2023)
mean_jun_flow	mean monthly flow volume of June in the catchment (computed for 1950 to 2020)	MCM/month	GHI (Goteti, 2023)
mean_jul_flow	mean monthly flow volume of July in the catchment (computed for 1950 to 2020)	MCM/month	GHI (Goteti, 2023)
mean_aug_flow	mean monthly flow volume of August in the catchment (computed for 1950 to 2020)	MCM/month	GHI (Goteti, 2023)
mean_sep_flow	mean monthly flow volume of September in the catchment (computed for 1950 to 2020)	MCM/month	GHI (Goteti, 2023)
mean_oct_flow	mean monthly flow volume of October in the catchment (computed for 1950 to 2020)	MCM/month	GHI (Goteti, 2023)
mean_nov_flow	mean monthly flow volume of November in the catchment (computed for 1950 to 2020)	MCM/month	GHI (Goteti, 2023)
mean_dec_flow	mean monthly flow volume of December in the catchment (computed for 1950 to 2020)	MCM/month	GHI (Goteti, 2023)

Table A5

Continued.

Attributes	Description	Unit	Data Source/reference
cv_jan_flow	variability in daily streamflow values in January	%	India-WRIS
cv_feb_flow	variability in daily streamflow values in February	%	India-WRIS
cv_mar_flow	variability in daily streamflow values in March	%	India-WRIS
cv_apr_flow	variability in daily streamflow values in April	%	India-WRIS
cv_may_flow	variability in daily streamflow values in May	%	India-WRIS
cv_jun_flow	variability in daily streamflow values in June	%	India-WRIS
cv_jul_flow	variability in daily streamflow values in July	%	India-WRIS
cv_aug_flow	variability in daily streamflow values in August	%	India-WRIS
cv_sep_flow	variability in daily streamflow values in September	%	India-WRIS
cv_oct_flow	variability in daily streamflow values in October	%	India-WRIS
cv_nov_flow	variability in daily streamflow values in November	%	India-WRIS
cv_dec_flow	variability in daily streamflow values in December	%	India-WRIS
mean_swmn_flow	mean flow volume of the southwest monsoon season (June, July, August, and September) in the catchment (computed for 1950 to 2020)	MCM/season	GHI (Goteti, 2023)
mean_atmn_flow	mean flow volume of autumn/retreating monsoon season (October and November) in the catchment (computed for 1950 to 2020)	MCM/season	GHI (Goteti, 2023)
mean_wint_flow	mean flow volume of the winter season (December, January, and February) in the catchment (computed for 1950 to 2020)	MCM/season	GHI (Goteti, 2023)
mean_sumr_flow	mean flow volume of the summer season (March, April, and May) in the catchment (computed for 1950 to 2020)	MCM/season	GHI (Goteti, 2023)
q_mean_swmn	mean daily streamflow of the southwest monsoon season (June, July, August, and September) in a catchment	mm d⁻¹	India-WRIS
q_5_swmn	5th percentile of daily streamflow in southwest monsoon season (June, July, August, and September)	mm d⁻¹	India-WRIS

Table A5

Continued.

Attributes	Description	Unit	Data Source/reference
q_25_swmn	first quartile of daily streamflow in southwest monsoon season (June, July, August, and September)	mm d⁻¹	India-WRIS
q_50_swmn	median of daily streamflow in southwest monsoon season (June, July, August, and September)	mm d⁻¹	India-WRIS
q_75_swmn	third quartile of daily streamflow in southwest monsoon season (June, July, August, and September)	mm d⁻¹	India-WRIS
q_95_swmn	95th percentile of daily streamflow in southwest monsoon season (June, July, August, and September)	mm d⁻¹	India-WRIS
rise_rate_mean	mean of all positive differences between consecutive daily flows	m³ s⁻¹	India-WRIS
rise_rate_median	median of all positive differences between consecutive daily flows	m³ s⁻¹	India-WRIS
rise_days	mean number of days in a year with positive differences between consecutive daily flows	d yr⁻¹	India-WRIS
fall_rate_mean	mean of all negative differences between consecutive daily flows	m³ s⁻¹	India-WRIS
fall_rate_median	median of all negative differences between consecutive daily flows	m³ s⁻¹	India-WRIS
fall_days	mean number of days in a year with negative differences between consecutive daily flows	d yr⁻¹	India-WRIS
num_hyd_alt	mean number of hydrologic reversals in a year (change from rise to fall)	–	India-WRIS

Table A6

Summary of catchment attributes representing land cover characteristics.

Attributes	Description	Unit	Data Source/reference
water_frac	water cover fraction (2017–2022)	–	Esri land cover (Karra et al., 2021)
trees_frac	trees cover fraction (2017–2022)	–	Esri land cover
flooded_veg_frac	flooded vegetation fraction (2017–2022)	–	Esri land cover
crops_frac	crop cover fraction (2017–2022)	–	Esri land cover
built_area_frac	urban cover fraction (2017–2022)	–	Esri land cover
bare_frac	bare cover fraction (2017–2022)	–	Esri land cover
range_frac	range cover fraction (2017–2022)	–	Esri land cover
dom_land_cover	dom_land cover type (2017–2022)	–	Esri land cover
dom_land_cover_frac	dom_land cover fraction (2017–2022)	–	Esri land cover
lai_mean	catchment mean leaf area index (2001–2020)	–	MODIS MCD15A2H (Myneni et al., 2015)
lai_min	minimum leaf area index (2001–2020)	–
lai_max	maximum leaf area index (2001–2020)	–
lai_diff	difference between maximum and minimum leaf area index (2001–2020)	–

Table A7

Summary of catchment attributes representing soil characteristics.

Attributes	Description	Unit	Data Source/reference
soil_depth	mean soil and sedimentary deposit thickness	m	Pelletier et al. (2016)
soil_conductivity_top	mean saturated hydraulic conductivity of topsoil (30–200 cm)	cm d⁻¹	HiHydroSoil v2 (Simons et al., 2020)
soil_conductivity_sub	mean saturated hydraulic conductivity of subsoil (0–30 cm)	cm d⁻¹	HiHydroSoil v2
soil_awc_top	mean available water content of topsoil (30–200 cm)	m³ m⁻³	HiHydroSoil v2
soil_awc_sub	mean available water content of subsoil (0–30 cm)	m³ m⁻³	HiHydroSoil v2
soil_awsc_min	minimum available water storage capacity of the soil	mm m⁻¹	FAO Soil Data
soil_awsc_max	maximum available water storage capacity of the soil	mm m⁻¹	FAO Soil Data
soil_awsc_major	available water storage capacity of the soil for the majority part of the catchment	mm m⁻¹	FAO Soil Data
sand_frac_top	fraction of sand in topsoil (0–30 cm) for the majority of the catchment area	% wt	Harmonized World Soil Database (HWSD) v2 (FAO and IISA, 2023)
sand_frac_sub	fraction of sand in subsoil (30–100 cm) for the majority of the catchment area	% wt	HWSD v2
silt_frac_top	fraction of silt in topsoil (0–30 cm) for the majority of the catchment area	% wt	HWSD v2
silt_frac_sub	fraction of silt in subsoil (30–100 cm) for the majority of the catchment area	% wt	HWSD v2
clay_frac_top	fraction of clay in topsoil (0–30 cm) for the majority of the catchment area	% wt	HWSD v2
clay_frac_sub	fraction of clay in subsoil (30–100 cm) for the majority of the catchment area	% wt	HWSD v2
gravel_frac_top	fraction of gravel in topsoil (0–30 cm) for the majority of the catchment area	% vol	HWSD v2
gravel_frac_sub	fraction of gravel in subsoil (30–100 cm) for the majority of the catchment area	% vol	HWSD v2
bulkdens_top_major	reference bulk density of topsoil (0–30 cm) for the majority of the catchment area	kg dm⁻³	HWSD v2
bulkdens_top_mean	mean reference bulk density of topsoil (0–30 cm)	kg dm⁻³	HWSD v2
bulkdens_sub_major	reference bulk density of subsoil (30–100 cm) for the majority of the catchment area	kg dm⁻³	HWSD v2
bulkdens_sub_mean	mean reference bulk density of subsoil (30–100 cm)	kg dm⁻³	HWSD v2
org_carb_top_major	organic carbon content in topsoil (0–30 cm) for the majority of the catchment area	% wt	HWSD v2
org_carb_top_mean	mean organic carbon content in topsoil (0–30 cm)	% wt	HWSD v2
org_carb_sub_major	organic carbon content in subsoil (30–100 cm) for the majority of the catchment area	% wt	HWSD v2
org_carb_sub_mean	mean organic carbon content in subsoil (30–100 cm)	% wt	HWSD v2
organic_frac_top	mean fraction of organic matter content in topsoil (30–200 cm)	–	HiHydroSoil v2 (Simons et al., 2020)
organic_frac_sub	mean fraction of organic matter content in subsoil (0–30 cm)	–	HiHydroSoil v2
hsg_major	hydrological soil group for the majority of the catchment area	–	HiHydroSoil v2
wtd	catchment mean water table depth	m	Fan et al. (2013)

Table A8

Summary of catchment attributes representing geological characteristics.

Attributes	Description	Unit	Data Source/reference
geol_porosity	mean subsurface porosity	–	GLHYMPS (Gleeson et al., 2014)
geol_permeability	mean subsurface permeability	m²	GLHYMPS
geol_class_1st	most common geological class in a catchment	–	GLiM (Hartmann and Moosdorf, 2012)
geol_class_1st_frac	fraction of catchment area associated with its most common geological class	–	GLiM
geol_class_2nd	second most common geological class in the catchment	–	GLiM
geol_class_2nd_frac	fraction of catchment area associated with its second most common geological class	–	GLiM
carb_rocks_frac	fraction of catchment area characterized as “carbonated sedimentary rocks”	–	GLiM

Table A9

Summary of catchment attributes representing anthropogenic influences.

Attributes	Description	Unit	Data Source/reference
num_dams	total number of large and medium dams in catchments	–	India-WRIS
res_store_sum	sum of total volume content of dams within the catchment	10³ m³	India-WRIS
n_dams	total number of dams in a catchment	–	GRaND (Lehner et al., 2011)
first_dam_year	year of construction of the first dam	–	GRaND
latest_dam_year	year of construction of the recent dam	–	GRaND
total_storage	total storage of the reservoirs	m³	GRaND
reservoir_index	ratio of total storage to multiyear annual streamflow	–	GRaND
irrigation_frac	percentage of dams used for irrigation	–	GRaND
hydroelec_frac	percentage of dams used for hydroelectric generation	–	GRaND
drinking_frac	percentage of dams used for drinking	–	GRaND
flood_frac	percentage of dams used for flood storage	–	GRaND
overflow_frac	percentage of dams used for overflow control	–	GRaND
navigation_frac	percentage of dams used for navigation	–	GRaND
tailings_frac	percentage of dams used for tailings (storing by products of mining operations)	–	GRaND
pop_density_2000	averaged population density of the catchment in 2000	people km⁻²	http://data.humdata.org (last access: 18 November 2024; WorldPop and CIESIN, 2018)
pop_density_2005	averaged population density of the catchment in 2005	people km⁻²	http://data.humdata.org
pop_density_2010	averaged population density of the catchment in 2010	people km⁻²	http://data.humdata.org
pop_density_2015	averaged population density of the catchment in 2015	people km⁻²	http://data.humdata.org
pop_density_2020	averaged population density of the catchment in 2020	people km⁻²	http://data.humdata.org
urban_frac_1985	fraction of urban land cover in a catchment in 1985	–	Roy et al. (2015)
urban_frac_1995	fraction of urban land cover in a catchment in 1995	–	Roy et al. (2015)
urban_frac_2005	fraction of urban land cover in a catchment in 2005	–	Roy et al. (2015)
crops_frac_1985	fraction of cropland land cover in a catchment in 1985	–	Roy et al. (2015)
crops_frac_1995	fraction of cropland land cover in a catchment in 1995	–	Roy et al. (2015)
crops_frac_2005	fraction of cropland land cover in a catchment in 2005	–	Roy et al. (2015)

Author contributions

NKM: conceptualization; data curation; formal analysis; investigation; visualization; writing (original draft preparation); writing (review and editing). KBK: conceptualization; data curation; formal analysis; investigation; writing (original draft preparation). PD: conceptualization; writing (review and editing). SS: conceptualization; writing (review and editing). VB: data curation; formal analysis. PPM: conceptualization; resources; supervision; writing (review and editing). AS: conceptualization; resources; supervision; writing (review and editing).

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Acknowledgements

The authors gratefully acknowledge the Central Water Commission (CWC), the National Water Informatics Centre (NWIC), and the Ministry of Jal Shakti (MoJS) for providing the streamflow dataset through the online portal, India – Water Resources Information System (India-WRIS; https://indiawris.gov.in/wris/#/, last access: 18 November 2024). The authors also extend their gratitude to the India Meteorological Department (IMD), Ministry of Earth Sciences, Government of India, for providing the gridded rainfall and temperature datasets through their respective websites. Additionally, the authors gratefully acknowledge the National Centre for Medium Range Weather Forecasting (NCMRWF), Ministry of Earth Sciences, Government of India, for the Indian Monsoon Data Assimilation and Analysis (IMDAA) reanalysis. The IMDAA reanalysis was produced under the collaboration between UK Met Office, NCMRWF, and IMD, with financial support from the Ministry of Earth Sciences under the National Monsoon Mission program. The authors utilized numerous publicly available datasets for compiling catchment attributes and the meteorological forcing time series, duly acknowledging and citing them where applicable. The authors extend their gratitude to all the researchers and contributing authors of these open-source datasets.

Financial support

Kanneganti Bhargav Kumar received financial support through the Prime Minister Research Fellowship (grant no. 0200367) provided by the Ministry of Education, Government of India. Pankaj Dey received financial support through the DST INSPIRE Faculty Fellowship (faculty reg. no. IFA22-EAS114 and application reference no. DST/INSPIRE/04/2022/001952). Vijaykumar Bejagam received financial support through the Prime Minister Research Fellowship (grant no. PM-31-22-659-414) provided by the Ministry of Education, Government of India.

Review statement

This paper was edited by Conrad Jackisch and reviewed by Ashish J. Manoj and Gemma Coxon.

Word count: 12484

Show less

© 2025. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

We introduce CAMELS-IND (Catchment Attributes and MEteorology for Large-sample Studies – India), a dataset containing hydrometeorological time series and catchment attributes for 472 catchments in Peninsular India, of which 228 catchments have observed streamflow data available for over 30 % of the period between 1980 to 2020. Peninsular India covers 15 interstate river basins defined by the Central Water Commission (CWC), where river flow and water level datasets are available for several gauge stations through the open-source India Water Resources Information System (India-WRIS). However, many of these gauge stations lack reliable metadata, and data are not in an analysis-ready format for large-sample hydrological studies. Therefore, we utilized 472 gauge stations and their catchment boundaries, characterized as stations with reliable metadata, from the Geospatial dataset for hydrologic analyses in India (GHI) (Goteti, 2023). For each of these catchments, CAMELS-IND provides a catchment mean time series of meteorological forcings for 41 years (1980–2020) and 211 catchment attributes representing hydroclimatic and land cover characteristics extracted from multiple data sources (including ground-based observations, remote sensing-based products, and reanalyses datasets). CAMELS-IND follows the same standards of the previously developed CAMELS datasets for the USA, Chile, Brazil, Great Britain, Australia, Switzerland, and Germany to facilitate comparisons with catchments of those countries and inclusion in global hydrological studies. Notably, CAMELS-IND includes available observed streamflow and catchment mean time series of 19 meteorological forcings, including precipitation, maximum, minimum, average temperature, long-wave and short-wave radiation flux, $U$ and $V$ components of wind, relative humidity, evaporation rates from canopy and soil surface, actual and potential evapotranspiration, and soil moisture of four layers (covering depth up to 3 m below ground) for detailed hydrometeorological studies. We also derived catchment attributes representing human influences, including the number of dams and their utilization, total volume contents of dams in catchments, population density, and increases in urban and agricultural land covers to facilitate studies to understand human influences on catchment hydrology. Furthermore, the dataset includes predicted streamflow time series from a regionally trained long short-term memory (LSTM)-based hydrological model for all 472 catchments which can fill gaps in observed streamflow data or serve as a benchmark for testing and developing new hydrological models. We envision that CAMELS-IND will provide a strong foundation for a community-led effort toward gaining new hydrological insights from hydrologically distinct Indian catchments and solving pertinent issues related to water management, quantification and risk assessment of hydrologic extremes, unraveling regional-scale hydrologic functioning, and climate change impact assessment of catchments across India. The CAMELS-IND dataset is available at 10.5281/zenodo.14005378 (Mangukiya et al., 2024).

Details

Title

CAMELS-IND: hydrometeorological time series and catchment attributes for 228 catchments in Peninsular India

Author

Mangukiya, Nikunj K¹

; Kanneganti, Bhargav Kumar²

; Dey, Pankaj¹

; Sharma, Shailza²

; Bejagam, Vijaykumar¹

; Mujumdar, Pradeep P³

; Sharma, Ashutosh⁴

¹ Department of Hydrology, Indian Institute of Technology Roorkee, Roorkee, 247667, Uttarakhand, India
² Department of Civil Engineering, Indian Institute of Science, Bangalore, 560012, Karnataka, India
³ Department of Civil Engineering, Indian Institute of Science, Bangalore, 560012, Karnataka, India; Interdisciplinary Centre for Water Research, Indian Institute of Science, Bangalore, 560012, Karnataka, India
⁴ Department of Hydrology, Indian Institute of Technology Roorkee, Roorkee, 247667, Uttarakhand, India; International Centre of Excellence for Dams, Indian Institute of Technology Roorkee, Roorkee, 247667, Uttarakhand, India

Pages

461-491

Publication year

2025

Publication date

2025

Publisher

Copernicus GmbH

ISSN

18663508

e-ISSN

18663516

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.5194/essd-17-461-2025

ProQuest document ID

3163365732

CAMELS-IND: hydrometeorological time series and catchment attributes for 228 catchments in Peninsular India

Jump to:

Full Text

Abstract

Details

Suggested sources