Habib et al. Complex Adapt Syst Model (2016) 4:8 DOI 10.1186/s40294-016-0020-0
Complex buildings energy systemoperation patterns analysis using bag ofwords representation withhierarchical clustering
Usman Habib1,3*, Khizar Hayat2 and Gerhard Zucker1
*Correspondence: [email protected]
1 Energy Department,AIT Austrian Instituteof Technology, Gienggasse 2, 1210 Vienna, AustriaFull list of author information is available at the end of the article
Background
This paper is an extension of work originally presented in proceedings of Frontiers of information technology (FIT15) Conference 2015(Habib and Zucker 2015). The energy systems of a typical contemporary building are usually complex and may contain several subsystems
2016 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/
Web End =http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Habib et al. Complex Adapt Syst Model (2016) 4:8
deployed independently of each other. In order to analyze various energy performance aspects of a given building, a lot of raw data is recorded during its monitoringKhan etal. (2011). The recorded data is studied at later stages in order to nd interesting features, using a variety of visualization tools(Mourad and Bertrand-Krajewski 2002). The massive amount of recorded data makes any detailed performance analysis a formidable task. Moreover, there is a high chance of overlooking some important patterns in the data which, if noticed properly, may help identify faults that can compromise energy efficiency.
Patterns are regular, usually repetitive, sequences in a given data and may owe their existence to a specic event. A pattern is thus dependent on the characteristics of a system and may represent the underlying processes and structure of the system. Methods that can automatically identify interesting patterns from buildings data, help to get useful insights into the various parameters of energy usage as well as the source of faults in dierent components. In this context, data mining techniques like clustering are feasible tools to address these issues. The process of automatically nding the various patterns in the data can make the subsequent analysis easier, more feasible and lesser laborious(Miller etal. 2015; Iglesias and Kastner 2013; Narayanaswamy etal. 2014; Lin and Li 2009).
We aim to exploit machine learning for nding various patterns in energy related building data . The idea is to realize all this with minimum possible conguration changes and knowledge of the relevant eld. More specically, in order to automatically nd dierent patterns in the adsorption chillers operation, in this article, we propose to use a bag of words representation (BoWR) with subsequent hierarchical clustering. The suggested method has been applied to the operation data of a water chiller and compared to another approach called dynamic time warping (DTW) using cophenetic correlation. The dynamic time warping (DTW) method uses a dynamic programming technique for dening the best alignment between the two time series dataKeogh and Ratanamahatana (2004). Furthermore, the cophenetic correlation demonstrates that the cluster tree has a strong correlation with the distances between objects in the distance vectorLin and Li (2009). The On/O state information required for the suggested technique is detected by using the k-means clustering algorithm. As we are taking the sensor readings that are placed outside the chiller, the sensors reading will reect the behavior of the chiller during its operational cycle. Thus, the On (operational) states are of greater importance for assessing the performance of chillers and faults detection and diagnosis (FDD). Moreover, avoiding the O cycle for nding dierent patterns will reduce the amount of data as well. The On (operational) cycles are discretized by using the symbolic aggregate approximation (SAX) method. These discretized values are called symbols or words. After transformation of the On cycles to words, a normalized histogram for each On cycle is created; called bag of words representation (BoWR). The normalized BoWR is used because the On (operational) cycles vary in there duration. The hierarchical clustering uses the normalized BoWR of the On cycles for nding the various operational patterns of the chiller. The details of the dierent clusters created by hierarchical clustering are also explained in detail.
The rest of the paper is arranged as follows. The next section discusses the state of the art methods available in the literature. In the subsequent section, the design of the demonstrated system is elaborated. This is followed by a section describing the methodology of the proposed solution for nding the patterns in the data. The penultimate section explains the dierent experiments and results, followed by a "Conclusion" section.
Page 2 of 20
Habib et al. Complex Adapt Syst Model (2016) 4:8
State ofthe art
This section discusses state of the art methods, from literature, proposed for nding operation patterns in dierent energy systems, in the context of buildings. The energy systems can be modeled using simulation tools.
Complex systems (CA)
Complex systems consist of many interacting components and many hierarchical layers and, in general, it is impossible to reduce the overall behavior of the system to a set of properties characterizing the individual components. The interaction between components is able to produce properties at the collective level that are simply not present at the component considered individuallyAvram and Rizescu (2014); Moat (2010). The focus of complex systems is to study the system modules (subsystems), their interaction with each other, and how each module is contributing to the overall behavior of the system. Examples of complex systems are:
people creating social systems,
our nervous system, with brain spinal cord and neurons being the subsystems, and
a weather forecast system with factors like wind ow, pressure and temperature contributing in predictions.
cities can be considered as system and dierent aspect such as social physics, urban economics, transportation theory, regional science, and urban geography can be considered as subsystem (agents) for designing the citiesBatty (2007).
The knowledge of complex systems can be used in all traditional disciplines of science, along with engineering and management. The main focus in the complex systems is on questions regarding parts (independent), overall behavior and relationships. The information provided with the complex systems, using sophisticated tools, is helpful to think analytically about these systems in detail and contributes in the modeling or simulation analysis of these systems.
Complex adaptive systems (CAS)
A Complex adaptive system (CAS) can be dened as an open system with large variability and diversity of elements or agents, with dynamic interactions among them that create non-linear feedback systems (Faucher 2010). Such systems are usually linked to the learning activities, in order to provide various features of CAS, like self-organization and unpredictability. They are also described as special cases of complex systems, which can be called as complex macroscopic collection of relatively similar micro-structures that are partially connected. These macro-structures are formed to adapt the changes in the environment, and increase its survivability(Kayman 2014).
The subsystems of a complex system are generally modeled as agents. The agents are usually goal-oriented, variable in number and, the condition of the environment can be aected by other agents. The three basic properties of the CAS are(Andrii 2014):
1. Adaptation: This characteristic of CAS is relevant to the adaptability of the system to changes in the environment.
Page 3 of 20
Habib et al. Complex Adapt Syst Model (2016) 4:8
2. Self-organizing: This characteristic is dependent on the structure of the system as well as its internal processes; the underlying question being how the larger dynamic system organizes itself in critical situations.
3. Emergence: This characteristic denes the qualitative change in the behavior of the system during a change in its observation scale. It is one of the common characteristic of CAS where the behavior of the system is more complex than the sum of the behaviors of the components of the system. The emergent property is lost when the system is decomposed into its component parts or when an elimination of some component occurs.
In order to study the complex systems, one has to take into account all; be it the components, their interaction or the overall behavior of the system. Still the emergent behavior cannot be discounted. One of the common methods used is to ignore some system details that mean to nd a higher abstraction level of the system. In multidimensional scenarios, the space can be reduced by using mapping (generating a few equivalence classes). There are dierent factors usually ignored by CAS designers that can constrain the system and inuence their long term performance, called as energy. The authors inHadzikadic (2010) discusses the changes and inuence due to the variation in available energy. Furthermore, adding the concepts of efficiency and resilience in complex adaptive systems can be benecial in modeling(Korhonen and Snkin 2015).
Buildings asCAS
Complex systems scale from large systems like ecosystem(Levin 1998; Grimm etal. 2005) or social ecological systems(Olsson etal. 2004) to smaller systems such as secure authentication systems(Habib etal. 2011) or buildingsOosterhuis (2012) and their energy system(Azar and Menassa 2010, 2011; Jensen etal. 2016). Limited area notwithstanding, the analysis of a buildings energy system is a complex task as it consists of several subsystems. In order to make a detailed analysis of the energy systems, the buildings are monitored using sensors. Nowadays, it is feasible to maintain a record of the historic operation data in the building. While there exist other domains that have considerably higher amounts of data, the operation data in buildings are specically challenging, since there is commonly no appropriate underlying data model that can be generally applied to operation data; data is very specic to one building or component. Thus, data analytics methods have to supply a high degree of unsupervised automation in order to treat dierent types of data. Thus, today the main approach for data analysis is a simple visualization of the process parameters using time graphs as visualization tools. Such visualizations may further require manual followup performance analysis. Methods of analysis like these can be time intensive and there is always a chance to miss out some areas of interest that may eventually be of greater importance (Mourad and Bertrand-Krajewski 2002).
In order to make the buildings energy efficient, their prototype models are simulated for energy performance. For better designing and the ability to handle the dynamic nature of the buildings characteristics, each component of the building can be modeled as an active part; thus dierent components of the building will constitute a complex network (Oosterhuis 2012). There are many energy modeling methods that are generally used for predicting the buildings performance during the design phase. The actual
Page 4 of 20
Habib et al. Complex Adapt Syst Model (2016) 4:8
energy consumption reading usually deviates from the predicted value during the modeling phase(Azar and Menassa 2010, 2011). Some of the reasons for this deviation are the dynamic parameters like occupants behavior, climate, and buildings properties(Azar and Menassa 2011). The agent based modeling can be used to handle such dynamic parameters. For example, the dynamic nature of occupants behavior can be correlated with the impact on energy consumption in commercial buildings(Azar and Menassa 2010, 2011) or in managing ventilation system in residential buildings(Jensen etal. 2016). There are several bottom up approaches put forward for the agent based modeling. The authors inGrimm etal. (2005) have proposed a framework using a pattern oriented approach for agent based modeling to handle the complexity and uncertainty problems.
Other methods foranalysis ofenergy systems inbuildings
The reasons for analyzing data from the energy subsystems are manifold and include such objectives like assessment of the overall system performance, comparison with other systems, calculation of operating costs, and prediction of energy consumption and faults etc. The International energy agency (IEA) has launched an implementing agreement (i.e., a technology initiative) called IEA Solar Heating and Cooling Programme (SHC). Within this implementing agreement IEA SHC Task 38 Solar Air-Conditioning and Refrigeration was one of the research topics. The IEA SHC Task 38 (subtask A3a-B3b: Monitoring procedure for solar cooling system) denes a generic monitoring policy that provides information on sensor locations and naming for the evaluation of systems, evaluation of the system performance, and comparison of dierent energy systems(Napolitano etal. 2011). In the literature, one can nd many methods for faults detection and diagnosis (FDD) in building components. One important area is concerned with the Heating, Ventilation and Air-Conditioning (HVAC)(Pietruschka etal. 2015; Isermann 2005; Fan and Qiao 2011; Katipamula and Brambley 2005; Capozzoli etal. 2015; Katipamula and Brambley 2005; Lee and Eun 2015; Narayanaswamy etal. 2014). Prior knowledge about the system can be useful in nding some of the simple undetected faults using rst principles (i.e. energy balance, mass balance and other physical principles), but still there is a requirement for more sophisticated techniques to judge various aspects of a buildings energy performance. One known class of techniques that makes use of historic operation data describes the behavior of the system, characterized as black box models, which are tted using the historical data(Katipamula and Brambley 2005, 2005). Faults can also be detected in buildings with machine learning algorithms using the information from the installed electricity consumption meters as shown in (Figueiredo etal. 2005), Domnguez etal. (2013). There are dierent parameters available that can be useful for the prediction of electricity consumption for each HVAC component; multivariate analysis can be used to calculate these parameters(Djuric and Novakovic 2012).
In order to detect various patterns in any energy system using data driven techniques, the focus is on extracting information from the recorded data using little to none domain expertise. There are several machine learning techniques that can be used for extracting information from the data, e.g. clustering can be used for nding similar daily performance patterns in the buildings (Miller etal. 2015; Seem 2005), detecting the abnormal performance from electricity consumptionSeem (2007), and further enhancing the performance optimization algorithms(Kusiak and Song 2008). Moreover, at a larger scale,
Page 5 of 20
Habib et al. Complex Adapt Syst Model (2016) 4:8
wavelet transformations and clustering can be used for the classication of electrical demand proles of buildings(Florita etal. 2013).
The data is usually stored as a time series for later analysis. The time series data can be represented with dierent available techniques that can further help in nding the similarity between the data having same behavior. An example is the symbolic aggregate approximation (SAX), a category of Piecewise Aggregate Approximation (PAA), that can be used to improve the speed and usability of several analysis techniquesLin etal. (2007). The similarities between dierent time series data can even be calculated by simply using the Euclidean distance parameter, but the problem in this method is that even a slighter shift of data can lead to erroneous results(Lin and Li 2009). A comparison of time series data similarity algorithms (Euclidean, DTW, wavelets) is carried out inLin and Li (2009) par rapport the method of bag of patterns using hierarchical clustering. The authors have concluded that the bag of patterns representation (BoPR) approach performed better for nding similarities in the time series data as compared to other methods. The use of bag of words model can be seen in various elds with classication (Anwar etal. 2015).
One of the well-known methods used for nding similar groups via data mining is clustering(Armano and Javarone 2013; Shah etal. (2015). The decision of the optimal number of clusters is an important issue in unsupervised methods, in general, and in hierarchical clustering, in particular. A clustering algorithm can give better results if the inter-cluster variations are minimum and intra-cluster variations are maximum (Tibshirani etal. 2001). Clustering algorithms can also be used for nding various energy states in the building, e.g., k-means clustering can be used to detect the state (On/O) of machine, as data toggle between these two states(Habib etal. 2015; Zucker etal. 2015a, b). Another example of using clustering for nding system states can be found in Zucker etal. (2014), where the X-Means clustering algorithm is used for automatically detecting the system states (On/O), in order to examine the operational data of adsorption.
Cluster evaluation methods
Cluster evaluation is usually carried out using graphical methods. One such way is to plot error measurement against the number of clusters. In this method the position, where the plot creates an elbow in the graph, can be taken as the number of clusters, since the elbow occurs at the point of sudden decrease in the error measurement(Ketchen and Shook 1996). There are other methods that can be used to nd optimal number of clusters in the data, e.g. the Silhouettes criterion method Rousseeuw (1987), Davies-Bouldins criterion method Davies and Bouldin (1979) and Calinski-Harabasz criterion method Caliski and Harabasz (1974). Other than these techniques, a method from the literature is based on gap statistics analysis wherein the gap criterion nds the optimal number of clusters by estimating the elbow location as the number of clusters against the largest gap value(Tibshirani etal. 2001). The gap value can be dened as(Tibshirani etal. 2001)
where En is the expected value, n is the size of the sample, k is the number of clusters that are being evaluated, and Wk is the dispersion measurement within the cluster and can be nd as
Page 6 of 20
Gapn(k) = En{log(Wk)} log(Wk),
(1)
Habib et al. Complex Adapt Syst Model (2016) 4:8
where nr represents the count of data points in the cluster r, and Dr denotes the sum of the pairwise distances for all data points in the cluster r.
Design ofthe demonstration system
This section discusses the architecture of the system that has been under observation for applying the proposed method. For this research, the data from selected solar adsorption chillers is used for the period of 16 months from January 2014 until April 2015. The monitoring policy with naming convention of IEA SHC Task 38 for solar and cooling had been followed(Napolitano etal. 2011). The design of the system is shown in Fig.1, showing three dierent cycles in the system along with the installed sensors.
The process involves the three main parts which can be summarized as follows:
The low temperature (LT)cycle is representing the part of the system that is handling the low temperature water produced by the chiller.
The medium temperature (MT) cycle represents the system portion where the unwanted heat of the system is transferred to the environment using cooling tower.
The high temperature (HT) cycle is showing the section of the system where heat is provided to produce cold water by the chiller.
The 18 dierent parameters of interest along with their description are given in Table1.
In order to nd patterns in the operational data (On cycles), dierent tests were performed in consultation with the experts in the eld. There are additional features added
Page 7 of 20
WK =
k
[notdef]
r=1
1
2nr Dr,
(2)
Habib et al. Complex Adapt Syst Model (2016) 4:8
Page 8 of 20
Table 1 Parameters description
Sensors Description
E6 High temperature (HT) electricity consumption meterE7 Medium temperature (MT) electricity consumption meter E8 Low temperature (LT) electricity consumption meter Q6a_m3h HT cycle Flow (water) readingQ12_m3h MT cycle Flow (water) readingQ7_m3h LT cycle Flow (water) readingT_HTre HT cycle temperature on return sideT_HTsu HT cycle temperature on supply sideT_MTre MT cycle temperature on return sideT_MTsu MT cycle temperature on supply sideT_LTre LT cycle temperature on return sideT_LTsu LT cycle temperature on supply sideQ6a_KW HT cycle Energy consumption readingQ12_KW MT cycle Energy consumption readingQ7_KW LT cycle Energy consumption readingPR6 Pressure in HT cyclePR7 Pressure in LT cyclePR8 Pressure in MT cycle
for better results. The temperature dierence between the return and supply temperature sensors of each of the cycle had been used as a feature that are given as,
Therefore, the new set of features are added with other selected parameters for hierarchical clustering in next step. The following Table2 shows the dierent features that have been used for the hierarchical clustering.
[notdef]Temp_LT = |T_LTre T_LTsu|
(3a)
[notdef]Temp_HT = |T_HTre T_HTsu|
(3b)
[notdef]Temp_MT = |T_MTre T_MTsu|
(3c)
Table 2 Selected features forhierarchical clustering
Features Descrition
[notdef]Temp_LT Temperature dierence of low temperature cycle [notdef]Temp_HT Temperature dierence of high temperature cycle [notdef]Temp_MT Temperature dierence of medium temperature cycle Q6a_m3h Flow in high temperature cycleQ7_m3h Flow in low temperature cycleQ12_m3h Flow in medium temperature cycleQ6a_KW Energy reading in high temperature cycleQ7_KW Energy reading in low temperature cycleQ12_KW Energy reading in medium temperature cycle
Habib et al. Complex Adapt Syst Model (2016) 4:8
Methods
This section describes the methodology proposed in this paper. The rst step followed in the analysis of data is always the preprocessing and nding outliers. The data used has already been processed; therefore it can be used without the preprocessing step.
The three methods used in this work were selected keeping in view their independence from two main factors, viz. conguration information and domain knowledge. The algorithms used in this research paper do not any require domain knowledge or conguration information, as illustrated in the Table3.
On state (operational) detection using Kmeans clustering
The distribution of the states for chiller vary a lot in the two states (On/O), therefore the data can be classied in two clusters. It can be readily observed from Fig.2 that the mean, minimum and maximum of the two clusters can be observed based on the dierence between temperatures, ows, energy readings and pressures. For this purpose, the K-means clustering with two clusters and Euclidean distance setting is used to detect the On and O state. After the detection of On/O state, at each point of the time, the consecutive On states are marked as one On cycle. The same procedure is adopted for all consecutive O states. The sensors are placed at the outer points of the solar cooling system, which means that during the On cycle, the data will be representing the system behavior; otherwise, during the non-operational period it will represent the behavior of the environment. Our interest lies in nding various patterns in the chillers data, therefore, only On cycles were considered for clustering.
Table 3 Selected algorithms analysis
Methods Algorithms Knowledge ofthe eld required
Duty cycle detection k-means No No Duty cycle representation BoWR No No Clustering Hierarchical clustering No No
Page 9 of 20
Conguration required
Habib et al. Complex Adapt Syst Model (2016) 4:8
Symbolic aggregate approximation (SAX) transformation
After the detection of On/O cycles, the data will be in the form dened by Eq.4:
where Ci is the ith cycle and St is the sensor value at time tick t. The data is normalized using Z-Scores which is given as:
where Z(Data) is the Z-score normalized form of the data, St is the sensor data at tth time tick, represents the mean and is the standard deviation. After applying the Z-Score normalization, the data will be now in the form as dened in Eq.6 while using the On/O information from the k-means clustering algorithm,
where Cyclei is the ith cycle of the data and in case if odd count of i is representing O cycle then even count of i will be presenting the On cycle in data. Zt is the normalized sensor value while Ni is the event count of Cyclei.
Each cycle data is rst broken down into M non overlapping sub-sequences, in a uniform manner, just like the example illustrated in Fig.3, wherein the partitions are represented by alphabets a,b,c and d. This process is called as chunking, and the period (x-axis) can be of dierent time length (P) depending on the application where it is used(Miller etal. 2015). The value of P is taken as 5 min in this research. The symbol of each data point is assigned according the breakpoints. The number of break points (M) taken for this research is 60. This transforms the data for each cycle to symbols. The SAX representation is specic for each a length of each cycle. In order to generalize the symbolic representation for each cycle with dierent lengths, the BoWR is used.
Page 10 of 20
Ci =
~S1, S2, S3, ......., SN
,
(4)
Z(Data) = St ,
(5)
Cyclei = {Z1, Z2, Z3, ......ZNi},
(6)
Habib et al. Complex Adapt Syst Model (2016) 4:8
Bag ofwords representation (BoWR)
In order to represent the complete behavior of a cycle, with dierent parameters taken under consideration, each sensors data is converted to a BoWR of 60 characters and put together in a 540 character representation, as can be seen in Fig.4.
All the required parameters will be converted to z-score before transformation to SAX symbols. The value of M is taken as 60 for this research. BoWRi pertains to BoWR of the ith On cycle containing all features (shown in Table2). A pattern can be dened as:
The BoWR_SensoriP consists of a vocabulary set {w1, w2, w3, ....., wM} of sensor P. Theassociated histogram vector BoWR_SensoriP for ith On cycle will be like the following:
where P is representing the features (see Table2) selected for nding out dierent patterns in the chillers data. V ij is the number of occurrences of wj in the ith cycle, i.e.
where the subscript i in Counti refers to the ith cycle.
In order to handle cycles of variable time lengths, a better idea is to normalize, i.e. use relative frequencies. With this in view, Eq.9 can be modied as Eq.10 below:
where Ni is representing the number of time ticks in the ith cycle.
Hierarchical clustering
The hierarchical clustering technique groups the data over dierent scales by creating a cluster tree called dendrogram(Vesanto and Alhoniemi 2000). A dendogram shows a multilevel hierarchy of clusters, where the clusters (groups) at one level are joined together to constitute a cluster for the next level. This property of hierarchical clustering allows to decide the level of clustering that is the most appropriate for the task it is used for. The BoWR for each cycle is clustered using the hierarchical clustering technique. Figure5 shows the dendogram of the BoWRi given as input to the hierarchical clustering.
There are dierent techniques available to decide the best level or number of clusters for hierarchical clustering. One such technique is the gap methodTibshirani etal.
Page 11 of 20
BoWRi = {BoWR_Sensori1, BoWR_Sensori2, ...., BoWR_SensoriP},
(7)
BoWR_SensoriP = [notdef]Vi1 Vi2 Vi3 . . . ViM,
(8)
Vij = Counti(wj),
(9)
Counti(wj)
Vij =
Ni ,
(10)
Habib et al. Complex Adapt Syst Model (2016) 4:8
Page 12 of 20
(2001). A clustering algorithm gives better results when the intra-cluster dierence is as small as possible while the inter-cluster dierence is as high as possible.
Methodology overview
The steps involved, in the proposed method, are illustrated in Fig. 6. Below is a brief stepwise description of the method:
The rst step is to nd the On (operational) cycles in the data by using the k-means algorithm. The latter can be applied to any energy system because the two states are their in any energy dependent system and On duty cycle can be readily detectable.
The On cycles data are transformed to symbolic data with the SAX transformation method. This step also does not need any eld knowledge and is applicable to almost all energy systems.
Habib et al. Complex Adapt Syst Model (2016) 4:8
Page 13 of 20
A BoWR was created for the symbols of each On cycle. This procedure does not need any eld knowledge.
The BoWR are clustered by using the hierarchical clustering for nding various operation patterns of the chiller. This process does not need any eld knowledge.
The gap statistics is used to nd the optimal number of clusters in the data. This procedure does not need any eld knowledge.
The cluster patterns can be further investigated using the average performance indicators of each cluster.
Experiments andresults
The experiments had been performed on a data from water based chillers. Only On cycles were considered for clustering, as these are more appropriate for nding faults in the chiller. The hierarchical clustering was applied with dynamic time warping (DTW) and the proposed BoWR method. The comparison of the hierarchical clustering performance of the two methods was carried out with the help of cophenetic coefcients (Sarali etal. 2013). The cophenetic correlation is technique that demonstrates the cluster tree strong correlation with the distances between objects in the distance vector. Table 4 lists the cophenetic coefficients with dierent hierarchical clustering methods Levin (2007) using the BoWR and DTW techniques. The BoWR has strong correlation with distance with all other objects in all the clustering methods. The best results for BoWR are attained with the Average method for hierarchical clustering.
The rst step performed for the BoWR, was to nd the On cycles automatically. The results of the k-means clustering algorithm can be seen in Fig.7. The last graph in Fig.7 shows the On/O cycles of the chiller. It can be observed from the behavior of the temperatures at the low temperature (LT), medium temperature (MT) and high temperature (HT) cycle are responding according to the detected On/O state. It is clear from Fig.7 that during the detected On cycle, the LT temperature decreases showing the cooling operation. At the same time, increase in the temperatures at HT and MT cycle of the chiller can be noticed. These simultaneous variation in temperature gives a clear signal that the chiller is in operational mode, which has also been detected by the proposed method of k-means clustering.
The selected features of the detected On cycles were converted to BoWR. The hierarchical clustering makes a clustering tree (dendogram) that gives the option to select the
Table 4 Cophenetic coefficients ofdynamic time warping (DTW) andBoWR
No. Clustering methods
Bag ofword representation (BoWR)
Dynamic time warping (DTW)
1 Average 0.9897 0.0375
2 Centroid 0.9851 0.037
3 Complete 0.9753 0.035
4 Median 0.9803 0.0363
5 Single 0.9848 0.0414
6 Ward 0.9835 0.0363
7 Weighted 0.9888 0.0368
Habib et al. Complex Adapt Syst Model (2016) 4:8
Page 14 of 20
level (cuto) for clustering. The gap statistics Tibshirani etal. (2001) had been used to nd the optimal number of clusters, depending on the gap between dierent clusters. As it can be observed from Fig.8, the gap statistics analysis gives the best gap distance with ve clusters.
The cluster information of the ve clusters are given in Table5. The interesting pattern group is Cluster1 and Cluster2, as the average operation time in these clusters is greater than around 1 hour. For nding faults, Cluster1 patterns are more suitable since the average Coefficient of Performance (CoP) of cycles in this cluster is 0.16, in comparison to the average operational time of cycles that is around 68 hours, thus showing that the chillers performance is bad. A majority (98.75 %) of the On cycles lies in Cluster1. The
Habib et al. Complex Adapt Syst Model (2016) 4:8
Table 5 Cluster information ofthe ve clusters withhierarchical clustering
Cluster_no Percentage ofcycles incluster (%)
Cluster1 0.73 0.16 67.65 Cluster2 98.75 0.54 0.95 Cluster3 0.06 0.62 0.09 Cluster4 0.34 0.87 0.07 Cluster5 0.12 0.7 0.06
latter represents the cycles with normal operational behavior, since its average CoP is 0.54 while the average operational time of the cycles in this cluster is around one hour. Cluster3, Cluster4 and Cluster5 are representing the cycles with shorter operational time, as the machine is in transient phase; thus the patterns in these clusters are dierent from a normal operational behavior of the chiller and are thus not plausible.
For further investigation of the cycle behaviors in Cluster1, the graph in Fig. 9 had been drawn in order to show the behavior pattern of one of the On cycles in cluster1. The three graphs in Fig. 9 display the temperature dierence ([notdef]Temp_LT, [notdef]Temp_HT, [notdef]Temp_MT),ows (Q7_m3h, Q6a_m3h, Q12_m3h) and energy meter readings (Q7_KW , Q6a_KW ,
Q12_KW) in the low, high and medium temperature cycles, respectively. The x-axis displays the time (in minutes) for the On cycle. In each graph, the values are represented with the intensity of the color given in the form of a vertical color code bar at the right hand side of each plot. It can be observed form Fig.9 that at 30 min, the [notdef]Temp_MT becomes zero, showing that the cooling tower is not operating normally, thus causing no change in the temperature of MT cycle. It is also important to note that the ow variable (Q12_m3h) for MT cycle is showing ow throughout the cycle. Due to this eect, the [notdef]Temp_LT has also started decreasing and at around 80 minute, the cooling has been stopped by the chiller. At the same time, the deriving heat ([notdef]Temp_HT) has been provided to the chiller but the chiller is not able to match the cooling load, thus showing
Page 15 of 20
Average CoP ofon cycles incluster
Average time ofon cyclesin cluster (hours)
Habib et al. Complex Adapt Syst Model (2016) 4:8
lower coefficient of performance. This pattern of the chiller is giving a clue about the faults in the chiller that need to be diagnosed. Furthermore, to support the argument that Cluster1 is representing group of On cycles having bad performance, the histogram of CoP of On cycles grouped in Cluster1 are shown in Fig.10. In order to nd the CoP, the following equation is used(Napolitano etal. 2011).
The histogram shows that 80 % of the On cycles have CoP less than 0.2, thus representing low performance of the chiller.
The same procedure had been adopted to see the behavior pattern of one of the On cycles in cluster2, as can be seen in Fig.11. The x-axis displays the time, in minutes, for the On cycle.
In each graph, the values are represented with the intensity of the color given in the attached color bar. It can be observed from Fig.11 that the duration of the On cycle is 195 minutes. The temperature dierence parameters show that there is cooling in the LT
Page 16 of 20
CoP = Q7_KW
Q6a_KW.
(11)
Habib et al. Complex Adapt Syst Model (2016) 4:8
cycle, as [notdef]Temp_LT is representing it with time. The eect can be seen in the MT cycle as well. At the same time, HT cycle shows that the constant driving heat was provided to the chiller. It is also important to mention that the ow variable displays the ow in all the three cycle, whereas, same had been observed for the energy parameters. This behavior pattern shows the normal operation of the chiller. The histogram of CoP of On cycles, grouped in Cluster2, is shown in Fig.12 in support of the argument that Cycle2 is representing the group of On cycles corresponding to the normal performance of the chiller. The histogram shows that the chiller is performing with CoP between 0.4 to 0.8, thus representing the normal behavior of the chiller.
Comparison ofproposed method withCAS modeling
The main point in this research is to nd various patterns in the operation of the energy system in buildings using minimum possible input from the engineers. For the analysis of the energy system, the data has been selected using IEASHCTask38. The issues that may surface, while modeling a current system using CAS, can be traced back to the complete knowledge of the system, its behaviors or states and the interaction of the subsystems; a problem of scale dealing with a very large state-space representation. Secondly, complex dynamic systems will require transitions between completely dierent behaviors in the form of what is called phase transitions. Hence a critical transition detection will require a detailed state-space model.
Conclusions
The main goal of this research work is to provide analysis algorithms that automatically nd the various patterns in the energy system of a building using as little conguration or eld knowledge as possible. A bag of word representation method with hierarchical clustering has been proposed to assess the performance of a building energy system. In the rst phase, a k-means clustering algorithm is used to nd the On (operational) cycles of the chiller. These On cycles are represented with symbols by using symbolic aggregate approximation (SAX) method. Furthermore, the symbolic representation is transformed to BoWR, which is provided to hierarchical clustering. The proposed method has been compared with dynamic time warping (DTW) method using cophenetic coefficients and it has been shown that the BoWR has produced better results as compared to DTW. The
Page 17 of 20
Habib et al. Complex Adapt Syst Model (2016) 4:8
results of BoWR are further investigated and for nding the optimal number of clusters, gap statistics have been used. At the end, interesting patterns of each cluster are discussed in detail.
In future, the current research can be used in the eld of automatic faults detection and diagnostics (FDD) in buildings, as the current research helps in nding the dierent performance patterns. This would help the experts in the eld to look only for those areas where the performance is bad. Further research is needed in order to nd intelligent ways of diagnosing the faults
Authors contributions
UH, KH and GZ conceived and designed the experiments. The experiments are performed by UH. The data has been analyzed by UH, KH and GZ. The paper is written by UH, KH and GZ. All authors read and approved the nal manuscript.
Author details
1 Energy Department, AIT Austrian Institute of Technology, Gienggasse 2, 1210 Vienna, Austria. 2 College of Arts and Sciences, University of Nizwa, Nizwa, Sultanate of Oman. 3 Computer Science Department, COMSATS Institute of Information Technology, Abbottabad, Pakistan.
Acknowledgements
This work was partly funded by the Austrian Funding Agency in the funding programme e!MISSION within the project extrACT, Project Number 838688.
Competing interests
The authors declare that they have no competing interests.
Received: 1 February 2016 Accepted: 13 May 2016
References
Andrii C (2014) Exploring behavioral patterns in complex adaptive systems. PhD thesis, University of Pittsburgh,
Pennsylvani
Anwar H, Zambanini S, Kampel M (2015) Efficient scale and rotation invariant encoding of visual words for image classication. IEEE Signal Process Lett 22(10):17621765Armano G, Javarone MA (2013) Clustering datasets by complex networks analysis. Complex Adapt Syst Model 1(1):5 Avram V, Rizescu D (2014) Measuring external complexity of complex adaptive systems using onicescus informational energy. Mediterr J Soc Sci 5(22):407Azar E, Menassa CC (2011) Agent-based modeling of occupants and their impact on energy use in commercial buildings.
J Comp Civ Eng 26(4):506518
Azar E, Menassa C (2010) A conceptual framework to energy estimation in buildings using agent based modeling. In:
Proceedings of the 2010 winter simulation conference (WSC), pp 31453156
Batty M (2007) Cities and complexity: understanding cities with cellular automata, agent-based models, and fractals. The
MIT press, Massachusetts
Caliski T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat 3(1):127Capozzoli A, Lauro F, Khan I (2015) Fault detection analysis using data mining techniques for a cluster of smart office buildings. Expert Syst Appl 42(9):43244338Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224227Djuric N, Novakovic V (2012) Identifying important variables of energy use in low energy office building by using multivariate analysis. Energy Build 45:9198Domnguez M, Fuertes JJ, Alonso S, Prada MA, Morn A, Barrientos P (2013) Power monitoring system for university buildings: architecture and advanced analysis tools. Energy Build 59:152160Fan W, Qiao P (2011) Vibration-based damage identication methods: a review and comparative study. Struct Health
Monit 10(1):83111
Faucher JB (2010) Reconceptualizing knowledge management: knowledge, social energy, and emergent leadership in social complex adaptive systems. PhD thesis, University of Otago, DunedinFigueiredo V, Rodrigues F, Vale Z, Gouveia JB (2005) An electric energy consumer characterization framework based on data mining techniques. IEEE Trans Power Syst 20(2):596602Florita AR, Brackney LJ, Otanicar TP, Robertson J (2013) Classication of commercial building electrical demand proles for energy storage applications. J Solar Energy Eng 135(3):031020031020Grimm V, Revilla E, Berger U, Jeltsch F, Mooij WM, Railsback SF, Thulke H-H, Weiner J, Wiegand T, DeAngelis DL (2005)
Pattern-oriented modeling of agent-based complex systems: lessons from ecology. Science 310(5750):987991 Habib U, Jrstad I, Thanh DV, Khan IA (2011) A framework for secure linux based authentication in enterprises via mobile phone. J Basic Appl Sci Res 1(12):30583066Habib U, Zucker G (2015) Finding the dierent patterns in buildings data using bag of words representation with clustering. In: 2015 13th International conference on Frontiers of information technology, pp 303308
Page 18 of 20
Habib et al. Complex Adapt Syst Model (2016) 4:8
Habib U, Zucker G, Blochle M, Judex F, Haase J (2015) Outliers detection method using clustering in buildings data. In:
Industrial electronics society, IECON 201541st Annual Conference of the IEEE, pp 000694000700 Hadzikadic M (2010) Energy in the context of complex adaptive systems: Predator-prey dynamics. In: LAWDN-Latin-
American workshop on dynamic networks, p 1
Iglesias F, Kastner W (2013) Analysis of similarity measures in times series clustering for the discovery of building energy patterns. Energies 6(2):579597Isermann R (2005) Model-based fault-detection and diagnosisstatus and applications. Ann Rev Control 29(1):7185 Jensen T, Holtz G, Baedeker C, Chappin J (2016) Energy-efficiency impacts of an air-quality feedback device in residential buildings: an agent-based modeling assessment. Energ Build 19(1):4Katipamula S, Brambley MR (2005) Review article: methods for fault detection, diagnostics, and prognostics for building systemsa review. HVAC&R Res 11(1):325Katipamula S, Brambley MR (2005) Review article: methods for fault detection, diagnostics, and prognostics for building systemsa review. HVAC&R Res 11(2):169187Kayman EA (2014) Chaos in education as an intelligent complex adaptive system. Chaos and complexity theory in world politics 280Keogh E, Ratanamahatana CA (2004) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358386Ketchen DJ, Shook CL (1996) The application of cluster analysis in strategic management research: an analysis and critique. Strateg Manag J 17(6):441458Khan A, Hornbk K (2011) Big data from the built environment. Proceedings of the 2Nd International Workshop on
Research in The Large, LARGE 11ACM, New York, pp 2932
Korhonen J, Snkin J-P (2015) Quantifying the relationship of resilience and eco-efficiency in complex adaptive energy systems. Ecol Econom 120:8392Kusiak A, Song Z (2008) Clustering-based performance optimization of the boiler-turbine system. IEEE Trans Energ
Convers 23(2):651658
Lee ET, Eun HC (2015) Damage identication through the comparison with pseudo-baseline data at damaged state. Eng
Comp 40:18
Levin SA (1998) Ecosystems and the biosphere as complex adaptive systems. Ecosystems 1(5):431436Levin MS (2007) Towards hierarchical clustering (Extended Abstract). In: Diekert V, Volkov MV, Voronkov A (ed) Computer
Sciencetheory and applications: proceedings of second international symposium on computer science in Russia, CSR 2007, Ekaterinburg, pp 205215Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl
Discov 15(2):107144
Lin J, Li Y (2009) Finding structural similarity in time series data using bag-of-patterns representation. In: Winslett M (ed)
Scientic and statistical database management, vol 5566, Lecture notes in computer science. Springer, Berlin, pp 461477Miller C, Nagy Z, Schlueter A (2015) Automated daily pattern ltering of measured building performance data. Autom
Constr 49:117
Moat J (2010) Complexity theory and network centric warfare. DIANE Publishing, PennsylvaniaMourad M, Bertrand-Krajewski JL (2002) A method for automatic validation of long time series of data in urban hydrology. Water Sci Technol 45(45):263270Napolitano A, Sparber W, Thr A, Finocchiaro P, Nocke B (2011) Monitoring procedure for solar cooling systems. Technical
Report IEA Task 38, international energy agency
Narayanaswamy B, Balaji B, Gupta R, Agarwal Y (2014) Data driven investigation of faults in HVAC systems with model, cluster and compare (MCC). In: Proceedings of the 1st ACM conference on embedded systems for energy-efficient buildings. ACM, New York, pp 5059Narayanaswamy B, Balaji B, Gupta R, Agarwal Y (2014) Data driven investigation of faults in HVAC systems with model, cluster and compare (MCC). Proceedings of the 1st ACM conference on embedded systems for energy-efficient buildings, BuildSys 14ACM, New York, pp 5059Olsson P, Folke C, Berkes F (2004) Adaptive comanagement for building resilience in social-ecological systems. Environ
Manag 34(1):7590
Oosterhuis K (2012) Simply complex, toward a new kind of building. Front Arch Res 1(4):411420Pietruschka D, Dalibard A, Ben I, Focke H, Judex F, Preisler Helm M, Ohnewein P, Frein A, Muscher M (2015) Report for self-detection on monitoring procedure. Technical Report IEA Task 48/B6, international energy agency Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comp Appl
Math 20:5365
Sarali S, Doan N, Doan (2013) Comparison of hierarchical cluster analysis methods by cophenetic correlation. J
Inequal Appl 2013(1):18
Seem JE (2005) Pattern recognition algorithm for determining days of the week with similar energy consumption proles. Energy Build 37(2):127139Seem JE (2007) Using intelligent data analysis to detect abnormal energy consumption in buildings. Energy Build
39(1):5258
Shah MA, Abbas G, Dogar AB, Halim Z (2015) Scaling hierarchical clustering and energy aware routing for sensor networks. Complex Adapt Syst Model 3(1):5Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc
63(2):411423
Vesanto J, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11(3):586600Zucker G, Habib U, Blchle M, Judex F, Leber T (2015) Sanitation and analysis of operation data in energy systems. Energies 8(11):1277612794
Page 19 of 20
Habib et al. Complex Adapt Syst Model (2016) 4:8
Zucker G, Habib U, Blchle M, Wendt A, Schaat S, Siafara LC (2015) Building energy management and data analytics. In:
2015 international symposium on smart electric distribution systems and technologies (EDST), pp 462467 Zucker G, Malinao J, Habib U, Leber T, Preisler A, Judex F (2014) Improving energy efficiency of buildings using data mining technologies. In: 2014 IEEE 23rd international symposium on industrial electronics (ISIE), pp 26642669
Page 20 of 20
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
The Author(s) 2016
Abstract
Purpose
Due to the large quantity of data that are recorded in energy efficient buildings, understanding the behavior of various underlying operations has become a complex and challenging task. This paper proposes a method to support analysis of energy systems and validates it using operational data from a cold water chiller. The method automatically detects various operation patterns in the energy system.
Methods
The use of k-means clustering is being proposed to automatically identify the On (operational) cycles of a system operating with a duty cycle. The latter's data is subsequently transformed to symbolic representations by using the symbolic aggregate approximation method. Afterward, the symbols are converted to bag of words representation (BoWR) for hierarchical clustering. A gap statistics method is used to find the best number of clusters in the data. Finally, operation patterns of the energy system are grouped together in each cluster. An adsorption chiller, operating under real life conditions, supplies the reference data for validation.
Results
The proposed method has been compared with dynamic time warping (DTW) method using cophenetic coefficients and it has been shown that the BoWR has produced better results as compared to DTW. The results of BoWR are further investigated and for finding the optimal number of clusters, gap statistics have been used. At the end, interesting patterns of each cluster are discussed in detail.
Conclusion
The main goal of this research work is to provide analysis algorithms that automatically find the various patterns in the energy system of a building using as little configuration or field knowledge as possible. A bag of word representation method with hierarchical clustering has been proposed to assess the performance of a building energy system.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer




