Modelling of environmental parameters in lake environments using neural networks

Chapter 1 Mathematical models are having a major role in the area of environmental/ ecological sciences, including the field of limnology. With the use of mathematical models several biotic and abiotic parameters can be used as model’s input parameters, in order to simulate the arithmetical values of an examined parameter. Based on the mathematical models’ simulations associations/results can be derived about the behavior/impact of the environmental parameters (e.g. association of the soluble reactive phosphorus (SRP) values with the chlorophyll levels). Especially for the chlorophyll parameter simulation, it is found that the artificial neural networks are a better simulation technique comparing with other modelling methods. Additionally, it was found that the artificial neural networks are better to find/examine the patterns among several environmental parameters. This is attributed to the fact that the artificial neural networks are not affected by the non-linearity or the heterogeneity of the simulated system, while it must be noted that the majority of the ecological phenomena are characterized by non-linear behavior.

I would like to thank Prof. Eva Papastergiadou (again!) and Dr Kostas Stefanidis for sharing with me the limnological data, based on which the "core" of this PhD thesis was developed. I am glad that I had the chance to cooperate with Dr Kostas Stefanidis, since his contribution to my simulated data implementation was really useful.
Special thanks are devoted to my parents (Diamanto and Christos) and my brother Prokopis for their support, patience and love. Also, special thanks belong to my husband Stratos, for his love and support.

Dedication
To F. and M. for making me smile! Abstract Aquatic pollution is one of the most serious environmental issues that the human kind has to deal, especially the last decades. Therefore, the aquatic environments are suffering a great environmental pressure as well. Since the lakes are having about the 90% of the Earth's liquid freshwater, their proper management is a crucial task in order to maintain their good water quality. Eutrophication is one of the major problems that are observed in the limnetic environments. The eutrophic water is related with a series of environmental problems like anoxia, the existence of harmful cyanotoxins etc. Therefore, this study is focused on the assessment of the environmental parameters that are related with the phenomenon of the eutrophication.
The mathematical models are capable to contribute positively to the restoration of the good water quality status of a limnetic environment. The mathematical models can act as a mean to describe several environmental processes with the use of mathematical relationships. Several types of mathematical models have been applied into the area of environmental sciences, including the artificial neural networks. During the last decades the artificial neural networks have been applied successfully into the field of limnology. It has been documented that the artificial neural networks are superior to other modelling techniques (e.g. multiple linear regression model). This is attributed to the fact that the artificial neural networks can simulate with good correlation the complex non-linear relationships that are describing the environmental processes.
Therefore, several categories of artificial neural networks are applied in this study. The artificial neural networks can be divided into two main categories, those with supervised learning and those with non-supervised learning. In Application 1 and Application 2 supervised artificial neural networks are developed aiming to predict the chlorophyll parameter. Afterwards several sensitivity analysis (one way-sensitivity analysis) algorithms are applied in order to evaluate the environmental parameters with the biggest impact on the artificial neural network, which managed to simulate well (produced small error) the chlorophyll parameter. The synergistic effect of the parameters (two-way sensitivity analysis) is calculated with the use of the "PaD2" algorithm.

Overview of hydrological and water quality models.
Hydrological and water quality models can be considered as a mathematical representation of an aquatic ecosystem, that is trying to describe/mimic a dynamical or static phenomenon of the aquatic system's by integrating several related physical, biological and chemical variables. An example of a mathematical model applied in a limnological application is described in a study of Mellios et al. (2015), where several abiotic and biotic parameters are acting as input variables for the model and the model's structure is analytically represented in a diagram (see Figure 1.1). The hydrological and water quality models are mainly used to assess the impacts of climate change, land use, land and crop management practices on the quality and quantity of water resources (Marias et al., 2012).
The significance of water quality models is huge and multifold, since they can contribute into reducing the cost of expensive experiments by predicting an expensive to measure parameter or by simulating events taking place into inaccessible sites, because of serious environmental pollution issues (Wang et al., 2013). Furthermore, the water quality models can predict the interaction among the water quality parameters and the impact/effect of each environmental parameter on the simulated output parameter (Hadjisolomou et al., 2016), providing the associated authorities with possible managing scenarios for handling a water quality problem. A well constructed water quality model must meet some basic criteria in order to be considered suitable for modelling a particular purpose, such as those described bellow (Hajigholizadeh et al., 2018):  Take into consideration the dataset's nature requirements, that are associated with the phenomenon to be simulated. The need for scientifically based methods for managing water resources, led to the creation of many water quality models (Tsakiris and Alexakis, 2012). A review of water quality models is presented in the study of Gao and Li (2014), where among them the three most well-known models in hydrological studies are mentioned to be: the Soil Water Beside the above-mentioned models or the several others well known hydrological models like the QUALs; HSPF; CE-QUAL-W2; ELCOM-CAEDYM; EFDC (Wang et al., 2013;Gao and Li, 2014), new tendencies for future hydrological modelling applications have been raised and are analyzed as following (Gao and Li, 2014): I. Combination models: they are consisted of a combination of two or more individual models that are simulating a different phenomenon.

II.
Artificial Intelligence based models: these non-mechanistic models are models like artificial neural networks; genetic algorithms; support vector machines and fuzzy reasoning.

III.
System integration models: the integration of remote sensing system (RSS); geographical information system (GIS); global position system (GPS), which are called the 3S and can collect and process massive amounts of hydrological data.

The role of Artificial Neural Networks (ANNs) in Eco-hydrological
Modelling with an emphasis on algal production.
Artificial Neural Networks (ANNs) are widely used in hydrological modelling during the last decades, mainly because their theoretical background is evolving and the computational processing power is increasing . Initially only few applications of the ANNs dealing with environmental and ecological aspects were reported at the beginnings of the 90s (Lek and Guegan, 1999). The application of ANNs for simulating trophic function and harmful algal blooms is also a relative recent task (Teles et al., 2006).
Most of the water quality studies, that are using ANNs as modelling tools, are dealing with limnological or riverine applications, while ANN applications examining coastal systems are very scarce (Chau, 2006). Specifically, for eutrophication related problems and their catastrophic effects, the use of ANN's predictions can prevent or minimize the effects of any possible HABs (Devillers, 2009). The ANN's well known abilities to model complex and nonlinear relationships is ideal for eutrophication modelling. Based on these abilities ANNs are considered as a useful management tool regarding lake eutrophication (Karul et al., 1999).
Besides the predicting ability of ANNs to simulate algal productivity with a good accuracy, ANNs are good at examining the effect of the related water quality parameters based on their ability to associate them with the function of the algal biomass (Kilic et al., 2006). This is can be archived based on several mathematical relationships that are examining the sensitivity analysis for each environmental parameter and the associated impact on the production of algal biomass . Generally, the results of the sensitivity analysis of a trained ANN model are providing the modelers with useful information about the water quality parameters interrelationships in an ecosystem (Recknagel, 2001). A more detailed presentation of the sensitivity analysis methods is given in a following chapter. As stated by Chen and Zhang (2009) the ANNs can be meaningfully applied for the examination of effect-relations and the impact of several environmental parameters on a complex environmental system, in contrast with other conventional methods.
Besides the usage of sensitivity analysis, in their study Millie et al. (2012) are applying ANNs to model the microalgal abundance and are proposing a more sophisticated way to exam the relationships between the associated water quality parameters and the Chl-a. As stated by the above authors, this proposed method is based on a pedagogical knowledge extraction method by using the multi-dimensional response surfaces and by that way it was possible the creation of mathematical equations of iterative predictive outcomes of Chl-a. In another ANN study of Millie et al. (2014), a similar methodology is used to produce mathematical equations between the Chl-a and the associated water quality parameters and to visualize the Microcystins along with the associated environmental parameters interactions.
These two mentioned studies are presented in a more detailed way in a following chapter, since they provide an alternative mathematical way to associate the Chl-a and the several water quality parameters.
Comparing classical modelling techniques used in hydrological modelling (including water quality related problems), it is concluded that the ANN modelling techniques are superior and produce more reliable results. This is attributed to the fact that the data driven modelling techniques, like ANNs, are more advantageous for water quality modelling comparing with the classical process-based modelling techniques. Since the process-based techniques are usually requiring a large amount of monitored data samples and a large number of environmental variables , while the data driven modelling techniques are requiring fewer input parameters than process-based techniques and are computationally faster (Palani et al., 2008). Generally the process-based modelling techniques are less reliable than the data driven techniques, because they require too many data and parameters for calibration and their results are usually restricted to the time and sites that they were developed within (Chen et al., 2010). Based on the relative literature examination it is found that the ANNs are better predictors regarding water quality modelling comparing with general linear models, however some exceptions exist when a general linear regression model outperforms an ANN model (Ozesmi et al., 2006).
In the study of Jorgensen (2008) an overview of the major model types that are used in ecological modelling is presented: bio-chemical and bio-energetics dynamic models; static models; population dynamic models; structurally dynamic models; fuzzy models; artificial neural network; individual-based models and cellular automata; spatial models; ecotoxicological models; stochastic models; hybrid models. A comparison of the ANNs against the rest modelling methods is presented by the author and the main advantages of the ANNs are given as follow (Jorgensen, 2008):  ANNs are easy to be applied.
 They are not restricted of the data set heterogeneity.
 Can be used in cases when other modelling methods fail to be applied.
 Based on the test set usage they are evaluated and give a reliable indication of the certainty.
Where their main disadvantages of the ANNs comparing with the rest modelling techniques are associated with the following as stated by Jorgensen (2008):  Bio-geo-chemical models that are based on the conservation principles cannot be replace by the ANNs.
 ANNs are not able to examine the causality of the simulated processes, unless they are combined or introduced with an algorithm.
 ANNs sometimes are having limited accuracy.
Even though ANNs are having better modelling results than other modelling methods, sometimes even the ANNs fail to produce good enough output results, however ANNs are flexible modelling techniques and an experienced modeler might produce improved outputs.
For example this problem is examined in the study of Scardi (2001), which is dealing with the problem of limited available data when modelling with ANN techniques the phytoplankton primary production and for this purpose co-predictors (defined as variables that are correlated with the ones to be predicted, but not directly related to them) and the metamodelling procedure (when using additional information derived from other models in order to train, validate and test an ANN model) are used in order to improve previously constructed ANNs and as it was concluded by the author the ANNs are considered black-box models that their performance has the potential to be improved after applying some "modelling tricks". Another category of ANNs broadly used in the eco-hydrological field is the one of the unsupervised ANNs, known as the Kohonen Self-Organizing Maps (SOMs). These SOMs are mainly used for clustering/pattering environmental data and for exploratory data analysis (data mining) of the environmental data set. As stated by Park et al. (2003) usually the conventional multivariate analyses have been used, but ANNs are more suitable for ecological patterning because the interactions in the data set, which is consisted of many species and sampling areas, are nonlinear and complex.
An example of a SOM applied in eco-hydrological modelling is found in the study of Lu and Lo (2002), where a trophic state classifier was constructed based on a SOM model, aiming to diagnose the water quality of the Fei-Tsui Reservoir (Taiwan) during the monitoring period 1987-1995 and compared the simulated SOM results with those of the Carlson Index.
In another application given by    (2018); Ceccaroni et al. (2018). In the study of Chau (2006) an extensive review of several AI techniques like ANNs, genetic algorithms, knowledge-based systems and their integration into eco-hydrological modelling are presented. The most popular AI techniques that are found in eco-hydrological modelling studies are the ANNs and fuzzy controllers (Kruszewski et al., 2008;Han and Qiao, 2011). In the study of Millie et al. (2012) it is stated that "ANNs are a core form of Artificial Intelligence models that discern complex associations among variables through iterative and repetitive data presentation".
Artificial Neural Networks are considered to be a very promising machine learning method, since they can simulate non-linear relationships and to assess interactions between the environmental variables (Lek and Guegan, 1999;DeWeber and Wagner, 2014), while in many cases these environmental parameters are interrelated (Ahmed, 2014). Furthermore, the fact that the application of an ANN does not require a priori knowledge of the underlying mechanism/process, established the ANNs to be a suitable tool for handling various hydrological problems. Therefore the number of ANN's applications in eco-hydrological studies is increasing rapidly over the last decades (Faruk, 2010;Basant et al., 2010;Sarkar and Pandey, 2015). The fact that the ANNs can obtain high simulation and forecasting accuracy in eco-hydrological modelling established them as a management tool .
Generally, the ANNs are considered as 'black box' models by many researchers (Lek and Guegan, 1999;Kalteh et al., 2008;Ahmed, 2014), requiring no detailed information about the system (Ahmed, 2014). However, the term 'Grey-Box' is most suitable according to Millie et al. (2012), because based on heuristic knowledge extraction technique from a trained ANN the environmental interactions for the Chl-a parameter can be quantified through the summation of the related response-surface equations A very simple/primary definition of the ANN is that an ANN can be considered as a computing system, which is consisted of a highly interconnected set of simple information processing elements (units), analogous to a biological neuron (Sarkar and Pandey, 2015). The aim of an ANN is to create a model of the data-generated weight process, which can generalize and predict outputs from data that haven't been presented before (unknown / unseen) to the ANN model (Lee et al., 2013). ANNs are inspired by the ability of human beings to perform well complex tasks through the processing of the biological neural system (Han et al., 2012). As in nature, the biological neural system function is determined by connections through the elements/neurons (Sener et al., 2012).
The importance of neurons is significant for ANN modelling and as is stated by Munakata, (2008): "the way a natural neuron is the building block of the brain, the basic element of every ANN is an artificial neuron". A figure of a biological neuron is provided below (see Figure 2.1), where the comparison/parallelization with an artificial neuron is observed. The neuron is consisted by three elements: the dendrites, cell body and the axon.
The parallelization between a biological neuron and an artificial neuron is that the connections between the nodes can be parallelized with the axon and dendrites; the biological synapses can be parallelized with the connection weights; the threshold of the artificial neuron is approximating the activity in the cell body/soma (Sanchez Mesa et al., 2005).

Feed-forward multi -layer perceptron neural network
ANNs can be categorized into two groups, supervised and unsupervised. In the case of ANNs with supervised learning the given data is consisted of inputs and associated outputtargets. In contrast, ANNs with unsupervised learning, in their given data set have no outputtargets, therefore their aim is to discover patterns among inputs (Engelbrecht, 2007). The Kohonen's Self-Organizing Map (SOM) is the most famous unsupervised ANN, as it is analyzed by Kohonen, (1990). A detailed presentation of SOM neural networks is given in Chapter 3.
The ANNs have been described as parallel-distributed information processing systems that are mimicking the way the biological neural networks are processing the information and are having the ability to process/solve large-scale complex problems (Kalteh et al., 2008).
Because the ANNs are highly parallel architectures (meaning that their numerous independent operations can be executed simultaneously), the ANNs are the most preferred technique today for high speed processing of huge data sets (Sarkar and Pandey, 2015). The learning mechanism of ANNs was defined by Ball and De la Rosa (2006) as: "a computational mechanism able to acquire, represent, and compute a weighting (or mapping) from one multivariate space of information to another, given a set of data representing that mapping".
ANNs with skip layer connections are ANNs with direct connections between the input layer neurons and the output layer neurons, therefore linear relationships between the input variables and the simulated output are resulting (DeWeber and Wagner, 2014). Such ANN models, which have no hidden layers, are considered analogous to a linear model, while the increasing nonlinearities in the modeled relationships are analogous to the increasing number of hidden neurons (Cheng and Titterington, 1994;DeWeber and Wagner, 2014).
The most commonly used ANN in eco-hydrological modelling is the feed-forward multi-layer Perceptron (MLP) with back-propagation training algorithm (Maier and Dandy, 2000;Kalteh et al., 2008, Najah et al., 2011. The MLP neural networks are closely related to statistical models and are the most suitable type of ANNs for forecasting a phenomenon (Rumelhart et al., 1986;Najah et al., 2011). Additionally, as it is stated by Hsu et al. (1995) the three-layer feed-forward MLP can simulate successfully any of the real-world functional relationships, which may have a poorly defined form and also may have an unknown complexity. This means that the feed-forward neural network with one layer can approximate any smooth, measurable function by associating the output variable with the input variables with the use of connecting weights and transfer functions (Basant et al., 2010;Areerachakul et al., 2013). Therefore, the multilayer feed-forward neural networks can be characterized as "universal approximators" capable of learning any continuous function (Hornik et al., 1989;Cybenko, 1989).
Feedforward networks are suitable for function approximation (Gupta et al., 2003). A feedforward ANN, as it can be derived by its characterization, has an acyclic topology with no feedbacks loops among its layers and it is used for approximating non-linear mappings among inputs and outputs (Hu and Hwang, 2002). An ANN is divided by at least three layers. The input layer that imports the input parameters to the network, then one or more hidden layers and the output layer that gives the result. Each layer is consisted of neurons, also called nodes, as a neural network resembles a graph (Kasabov, 1998). The architecture of a feedforward neural network is graphically illustrated as in the figure below (Figure 2.2).
For every neuron there is synaptic weight that connects the specific neuron with every neuron of the next layer. Each neuron has a number of inputs, with exception those of the first layer, and one output (Dedecker et al., 2004). Feedforward networks are divided into at least three layers and each layer is consisted by neurons. The first layer is the input layer, at least one intermediate hidden layer and the output layer that produces the result. Each neuron of a layer is connected with all the neurons of the next layer with a synaptic weight. Aggregation is performed for every neuron on its weighted inputs from the previous layer and an output is yielded through a transfer function or else activation function (Veelenturf, 1995;Recknagel et al, 1997;Gupta et al., 2003). The output value of the jth neuron (oj) is given by the equations where f is the transfer function, xi is the input from ith neuron belonging to the immediate previous layer, wij is the synaptic weight that connects xi with the jth neuron and zj a bias term.
The output of each neuron is computed and propagated through the next layer until the last layer, producing a network output that is compared with the given output (Ghalkhani et al. 2013).
ANNs are trained with the use of a learning algorithm. Feedforward networks can be trained with the use of back-propagation algorithm's variations. During the learning procedure the synaptic weights are adjusted so they are minimizing an error function, usually the mean square difference between the predicted and the given output (Vilas et al., 2011). The ANN's training process is repeated until converge and the ANN is able to produce good outputs when is given unseen (unknown) input data (Kalteh et al., 2008). After the ending of the training phase, the ANN should be able to simulate the input data and produce reliable output values (Lee et al., 2013).
Back-propagation is the most famous and frequently used algorithm for training feedforward MLP (Haykin, 1999;Kalteh et al., 2008;Areerachakul et al., 2013;Lee et al., 2013;Zhang and Ding, 2017). The Levenberg-Marquardt algorithm, which is a modification of the back-propagation algorithm, is preferred for the training of moderate-sized feed-forward ANNs, because it is reported to be the fastest method (Demuth and Bealy, 2002;Simon, 2005;Cho et al., 2014). According to Ding et al. (2014) the application of the Levenberg-Marquardt optimization algorithm may shorten the learning time and improve the learning speed, because ANNs trained with the back-propagation algorithm have problems related with the existence of local minimum in convergence learning and slow convergence rates. Some variations of the backpropagation algorithm are the Levenberg-Marquardt algorithm, back-propagation with momentum, BFGS Quasi-Newton algorithm, resilient backpropagation algorithm and scaled conjugate gradient algorithm (Hagan et al., 1996;Demuth and Beale, 2002 where J stands for the Jacobian matrix of the first derivatives of the network's errors with respect to the weights and biases and e is a vector of network errors. I is the identity matrix and J T is the transpose matrix of matrix J. The μ is a scalar, that when approximates zero, then the Levenberg-Marquardt algorithm becomes the Newton's algorithm and when μ is large enough the Levenberg-Marquardt algorithm becomes the gradient descent method (Hagan et al., 1996;Demuth and Beale, 2002).    Hyperbolic tangent activation function, which is a sigmoid function and is having the following mathematical form: The hyperbolic tangent function can be graphically represented based on Figure   n, where its nonlinear nature can be observed.

ANN's topology selection
The number of the ANN's hidden neurons cannot be calculated on a "magic formula" and some rules of thumb exist in order to calculate the ANN's number of hidden neurons (Gazzaz et al., 2012). Some researchers stated that the selection of the optimal/proper number of an ANN's hidden neurons is a trial and error procedure (e.g. Ozesmi et al., 2006;Palani et al.;2008;Dogan et al., 2009;Basant et al., 2010;Liu and Chen, 2012), meaning that the determination of the hidden neurons number is problem depended. Meanwhile, some other researchers suggested some heuristic rules in order to find the optimal number of hidden neurons. Some of these heuristic rules are summarized by Goethals et al. (2007)  An alternative heuristic rule for calculating the optimal neurons number is proposed by , which is based on the combination of the following two equations Generally, it must be stated that the fewer the hidden neurons the better, since the created ANN is less prone to overfitting the unknown data and has better generalization abilities (e.g. Ozesmi et al., 2006;Palani et al.;2008). The overfitting is a situation when the computational error of the training set is having a small value, while for the test set the ANN is having large value for the computational error (Dogan et al., 2009). This is happening because the ANN has managed to memorize well the training set, but it is unable to generalize well to new situations (meaning to produce good output for new-unknown data inputs). When the number of hidden neurons is too small, then the ANN model might be unable to produce good simulated outputs because the ANN doesn't have sufficient degrees of freedom to learn the process correctly (Kisi, 2005;Basant et al., 2010;Liu and Chen, 2012). While if the number of ANN's hidden neurons is too large, then the ANN might overfit the data and the ANN's training may take too much time (Karunanithi et al., 1994;Kisi, 2005;Liu and Chen, 2012).
Besides finding the optimal number of neurons of an ANN, some other modelling techniques for avoiding the overfitting exist. The first one is the early stopping method and the second one is the method of regularization. In early stopping the data set is divided into three subsets: training set, validation set and test set. The network training stops when the errors for the validation set begins to rise, indicating that the network had began to overfit the data . The method of regularization involves modification of the network performance function, that is the sum of squares of the network errors on the training set. The modified performance function will cause the network to have smaller weights and biases, and this will force the network response to be smoother and less likely to overfit (Demuth and Beale, 2002;Kiliç et al., 2007).
The number of the ANN's hidden layers has also an important role for the ANN architecture. When the more hidden layers exist, then the more flexible the ANN model might become at simulating with better accuracy the data of the learning set. However, the serious risk of the ANN to be unable to simulate well unknown data is very possible, since the existence of more than one hidden layer (e.g.  Three or more hidden layers are causing unnecessary computational load to the ANN model, while an ANN architecture with one hidden layer is a more stable design. Generally, the majority of ANNs used in eco-hydrological modelling are having only one hidden layer (Basant et al., 2010), because one hidden layer is sufficient to produce arbitrarily complex outputs (Lippmann, 1987). Theoretical studies have proved that only one hidden layer is sufficient for the ANN to approximate any complex nonlinear function (Cybenko, 1989;Kisi, 2005), however the existence of more hidden layers is not prohibited (Kisi, 2005).
A graphical representation of the MSE for two different ANN architectures are presented below (see Figure 2.7). The first created ANN is having one hidden layer and the MSE is calculated for the test set and the training set. The second created ANN is having two hidden layers and the MSE is calculated for the test set and the training set. As it can be observed by the Figure 2.7, the ANN with only one hidden layer is having better performance (smaller SME), verifying the theory that in most cases the existence of only one hidden layer is giving better modelling results.

Measured data set pretreatment
The measured data set before been presented to the ANN must be pretreated/ processed. This step is a very important since it helps to reduce the ANN model computational complexity and ensures that all the input variables will be treated by the ANN model the same way, where all these will be discussed in detail below.
The tactic to apply Principal Component Analysis (PCA) on the water quality data set before presenting the variables to the ANN model, is a widely used data-pretreatment method when using ANN models. By that way the ANN's input parameters are selected by a larger-initial set of candidates-parameters Zhang et al., 2015;. PCA is often combined with ANN modelling, because by that way dimension reduction is enabled and the model's computational complexity is reduced, so the possibility of a model's misconvergence and poor accuracy is eliminated Hadjisolomou et al., 2016). For example, in a relevant ANN modelling study of Gazzaz et al (2012) -which was using water quality variables as predictors for the water quality of the Kinta River (Malaysia)-after applying PCA to the raw dataset, the 30 initial candidates parameters served as inputs for the ANN were reduced to 23 parameters used as inputs.
After PCA is performed and decided which variables to retain as ANN inputs, then the data are pre-treated based on the following steps that are summarized by Gazzaz et al (2012) as described below: 1. The censored, missing and inconsistent measured data values were assigned an arithmetical value based on a modelling technique like linear regression, nonlinear regression, ordinary least squares; or the mean value of the problematic variable for the associated monitoring period.
2. Data samples corresponding to statistical outliers and structural zeros were removed from the data set.
3. All the input variables of the ANN were standardized prior been presented to the ANN model.
The role of step 3, which is dealing with data standardization before been presented to the created ANN, is very crucial. That step is necessary in order to avoid model bias because of the different unit's magnitude of the variables (Hadjisolomou et al., 2018). Data transformation is helping to remove any possible offsets and it ensures that the input variables are contributing the same to the created ANN model and the distribution of the input variables are matching the distribution of the simulated output (Basant et al., 2010). The most commonly used standardization method for variables in ANN modelling is based on the following transformation, which is explained by Srinivasan et al. (1994): where in the above equation the term a is standing for the lower limit of the standardization range, the term b is standing for the upper limit of the standardization range, the term Xs is symbolizing the transformed observations of the parameter X, the term X0 is symbolizing the raw observations of the parameter X, the term Xmin is symbolizing the minimum value of the parameter X, the term Xmax is symbolizing the maximum value of the parameter X. However, in most ANN modelling studies the above equation (Equation 2.14) is adjusted/modified, where the terms a=0 and b=1 and this transformation is corresponding for the well known Min-Max transformation and is having the following formula (Gazzaz et al., 2012;Antanasijevic et al., 2014): This is because the input variables of the ANN are standardized into the range (0,1) of the logistic sigmoid transfer/ activation function (Gazzaz et al., 2012).
It must be noted that when using the phytoplankton or primary production as input variables, then these variables must be log-transformed before been rescaled to the interval (0,1). According to , this is necessary because the mean square error of the ANN will be biased, if raw data are used and since the algal biomass data with high values are containing greater sampling and measurement error, which error might dominate the ANN's output. After the ANN's training is over, then the simulated outputs are rescaled back to their initial (raw) form prior to post modelling computations (Basant et al., 2010;Hadjisolomou et al., 2016). observations of the parameter X, the term X0 is symbolizing the raw observations of the parameter X, the term Xmin is symbolizing the minimum value of the parameter X, the term Xmax is symbolizing the maximum value of the parameter X. Also the Xmean is symbolizing the mean value of the parameter X, the term Xstd is symbolizing the standard deviation of the parameter X and the term Xmedian is symbolizing the median value of the parameter X.
As stated above PCA is a widely used method to select the most relevant input variables for the ANN model. However, the PCA method is not unique and the selection of the ANN's input variables can be also done with the method of sensitivity analysis. As stated by Dogan et al. (2009) a sensitivity analysis is used in order to find out key input variables of the ANN and unnecessary input variables must be avoided, because the ANN's training process might be confused. The basic idea behind the use of sensitivity analysis usage in order to eliminate unnecessary input variables is that those input variables, which might have a weak impact on the ANN's simulated output, can be excluded from the input variables set (Rankovic et al., 2010). The workflow behind this tactic is to have the input variables to be sequentially excluded one by one by creating different ANNs that are not using them as inputs (Cho et al., 2014), and by that way a more compact final ANN model is created (Dogan et al., 2009).
As stated by Chen et al. (2006) this procedure is resembling a neural network trimming process, that it starts from the most complicated neural network (the one with the most input variables) and then is reducing towards the dominant features. The procedure of an ANN trimming is presented as in Table 2.1, where the Chl-a levels of one week ahead (Chl-a(t+dt)) are simulated based on the measured parameters Chl-a, pH, dissolved oxygen (DO), total inorganic nitrogen (TIN), total inorganic phosphorus (TIP), water temperature (Tw), rainfall (R). After the trimming procedure the resulted ANN model with the best performance is selected for the modelling process.

Evaluation of the ANN's Performance
The performance of an ANN is a very important task of the modelling process, since based on the ANN's performance the modelling results can be evaluated and the created ANN can be characterized as a reliable/good predictor or not (Hadjisolomou et al., 2016). Also, among a group of created ANNs -which are simulating a specific phenomenon like Chl-a values or dissolved oxygen concentration-the ANN with the highest/better performance is selected to be the most suitable for modelling that specific phenomenon (e.g. Najah et al., 2011;Sener et al., 2012;. As stated by Gazzaz et al. (2012) among these created ANN models, the one having the highest R 2 and the lowest MSE and RE is preferred, while the higher the R 2 value and closer to 1 the better.  It is given based on the below mathematical formula.
 Coefficient of determination (R 2 ): it is the square of the correlation coefficient.
It is giving the percentage of variability that can be explained by the ANN model and is having the two alternatives mathematical forms as below.

Sensitivity Analysis
Sensitivity analysis is determining/measuring how much sensitive is a are mentioning the 'backward stepwise' method as another very important sensitivity method.
The 'Perturb' method is computing the perturbation effect of the input variables regarding the output variable . The effect that a small change of an input variable has on the ANN's output is examined, so the input variables can be classified by an order of importance . The mathematical formula that is describing the 'Perturb' method of sensitivity analysis is explained by  as follow: where the parameter Np is representing the number of patterns (samples number), constructed with inputs and corresponding outputs for the training set.
The 'Weights' method is related with the interpretation of the connections weights from the input layer to the hidden layer . The contribution of each input parameter to the ANN's output is measured with the relative parameter importance (I) equation and is given by  as bellow: where nT is the number of time lags, nH the number of hidden neurons and nV is the number of input parameters.
The 'PaD' method or the Partial Derivatives method was firstly introduced by . This method is following two steps. First the partial derivatives of the output for small changes of each input are computed and afterwards the classification of the relative contributions of each input variable . The sensitivity of the ANN's output for the input variable xi is symbolized as SSDi and according to  is the sum of the squared partial derivatives obtained per input variable and has the following mathematical formula: where N is the observations number of the data set; dji the partial derivatives of the output yj with respect to input xj. The equation for dji is given bellow: where nh the neurons number of the hidden layer; Ihj the output of the h-th hidden neuron for the j-th input; wih the weight connecting the i-th input neuron and the h-th hidden neuron; who the weight connecting the output and the h-th hidden neuron; Sj is the derivative of the output yj with respect to input xj.

Two-way sensitivity analysis
The 'PaD2' method is examining the synergistic effect between the input parameters and is considered as a two-way interaction sensitivity analysis method. The pair-wise interactions of the variables are examined based on the 'PaD2' method. The application of the 'PaD2' method is very similar with the application of the 'Pad' method, but the 'PaD2' method is considered as more advanced modelling, because it considers the two-way interactions between variables, since ecological phenomena are rarely linear or a simple cause originated . As stated by Olay-Marin et al. (2016) the 'Pad' method is making use only of one independent variable at a time, while the 'PaD2' method is calculating the relationship when one predictive variable interacts with another. According to  the first part of the 'PaD2' method that gives the partial derivatives of the output yj with respect to inputs xji and xj2 is given by the following equation: where the symbols were explained above, except sj that is the second derivative of the output neuron with respect to its input van Maanen et al., 2010). The relative contribution of the coupled variables to the ANN is the sum of squared partial derivatives of the coupled variables and is calculated as below: The larger the value of the variable SSDi , the larger impact (influence) the input variable xi has on the simulated output (van Maanen et al., 2010). 3. Unsupervised Neural Networks.

Self-Organizing Map (SOM) neural networks
The Kohonen's Self Organizing Map (SOM) is a special category of ANNs and more details of its origin can be found in the studies of Kohonen (1982); Kohonen (1998);Kohonen (2001). SOM is an unsupervised learning algorithm of ANNs, meaning that no human intervention is required during the learning process of the ANN (Zhang et al., 2008). The term unsupervised is attributed to SOM because no supervision is needed for its learning process.
The term self-organizing is derived from the SOM's ability to learn and organize information without being given the associated output values for the corresponding input data, while the desired output is not known a priori (Al Mudhaf et al., 2010).
SOMs are given great concern as modelling tools in water quality studies, since they have the ability to analyze multivariant data by means of a sophisticated visualization capacity and they are a good alternative to the classical water quality modelling techniques ). Generally, the SOM models are considered as a superior modelling technique comparing with the classical models used in water quality studies (Hadjisolomou et al., 2018), something that is discussed in more detail in a following chapter (see Chapter 4).
The SOM neural networks have started to be used widely in the water quality modelling because they are cable to resolve some drawbacks that other ANNs are facing, like the satisfactory description of the cause-effect relationship between multivariate data sets . Some examples of hydrological studies that are using the SOM neural Network are presented next, in order to highlight the significance of the SOM method in water quality modelling. In their study of Li et al. (2017) clustered 27 lakes in China into four groups with the use of a SOM and associated the 24 monitored water quality parameters with a trophic level index, in order to evaluate how prone was each lake to eutrophication. In another study of Lee and Scholz (2006a) a SOM neural network was used in order to predict the heave metal removal performance in experimental constructed wetlands and to characterize the heavy metal removal mechanisms. In their study Al Mudhaf et al. (2010) used the SOM neural network, aiming to examine if the levels of trihalomethanes (disinfection byproducts of health concern) in desalinated drinking water depended on the geographical location of water intake.
The SOM neural network is having many modelling abilities, which are making it an attractive modelling technique in water quality sciences. Several of these abilities are presented below as mentioned by Buscema (2010): 1. The SOM processes all the records and all the variables simultaneously.
2. SOMs are not sensitive to variables cardinality.
3. The SOM neural network can process non-linear relationships among the data.  .
Regarding the dimension reduction ability of the SOM Peeters et al. (2007) stated that the SOM is projecting multidimensional data into a two-dimensional grid in a topology preserving way and is associating non-linear complex relationships between the variables. The SOM is having the ability to represent the information of multi-dimensional environmental data into fewer dimensions (Chon, 2011). For the convenience of visualization, the artificial neurons (nodes) of the SOM are arranged in a limited number of dimensions, preferably two dimensional grids of neurons (Kohonen, 1982). Each neuron is associated with m=[m1, ….,mi] that is a i-dimensional weight vector (or prototype vector), where parameter i equals to the dimension of the input vectors (see Figure 3.1), and the neurons are connected to adjacent neurons by a neighborhood relation based on which the topology of the map is dictated .

The structure and basic steps of the SOM algorithm
The SOM topology is formed based on the local lattice structure and the global map shape. The local lattice structure can be rectangular or hexagonal and the centermost unit is associated with several discrete neighborhoods (size 0,1 and 2), where the first inner polygon is giving the 0-neigborhood, the second inner polygon the 1-neighborhood and the third inner polygon the 2-neighborhood (see Figure 3.2). On that rectangular or hexagonal grid the neurons of the output layer are arranged, and each of the input layer vectors is connected to each of the neurons of the output layer through a weight vector (Peeters et al., 2007;. The possible map shapes of the SOM are the: a) the sheet shape, which is the most commonly used (characterized as the default), b) the cylinder shape and c) the toroid shape. The different map shapes are visualized as in the following figure (Figure 3.3). This means that BMU and its neighboring (adjacent) neurons are changing their weights for each iteration, in order to reduce the distance between the input vector and the weights (Kohonen, 2001;Nourani et al., 2013). The learning procedure of the SOM algorithm is analytically explained in the following steps (Kangas and Kohonen, 1996;Kohonen, 2001;Giraudel and Lek, 2001;Peeters et al., 2007;Zhang, et al., 2008;  Inverse function: where in the functions above the term T is the training length and the term a0 is initial learning rate. The schematically representation of the above learning rate functions are given as in Figure 3.4.  3. An input vector is presented to the network and then distances between the given input vector and the i weight vector are calculated. The Euclidean distance (symbolized as Di) is the most frequently used and the mathematical equation that describes this relationship is given as below: where S is the number of output neurons, R is the dimension of the input vectors, pij represents the j element of the input vector, and wij symbolizes the j element of the i weight vector.
4. The smallest distance is chosen and the Best Matching Unit (BMU), that is defined as the neuron with the weight vector closest to the input variable x, as given by the equation: where || || symbolizes the distance measure, x the input vector, m the weight vector, and c the subscription of the weight vector for the winning neuron.
5. The weights of the BMU and the unit within its neighborhood ratio N(t) are updated in order the new reference vectors to be closer to the input vector (see Figure 3.7). During that process the neighborhood ratio function ℎ ( ) and the learning ratio a(t) are decreasing along with the number of iterations of the model, forcing by that way the network to converge. The weight vector is updated based on the following relationship: where ( + 1) is symbolized the weight vector at the learning step t+1. 6. The process goes on with an interactive way and the steps 3 to 5 are repeated, until a predefined number of iterations is done and then the process starts again from step 3.
After training the SOM its quality must be assessed.
where EQE is the quantization error; m is the number of input vectors of the SOM; Xp the data vectors; Mc(p) the winner vectors of Xp.
where m is the number of input vectors of the SOM; Xp is a data vector.   and has the following formula: Another proposed way for determining the optimum map size of the SOM is to found for which neurons number the quantization error (QE) and the topographic error (TE) are minimized (Lee and Scholz, 2006a;Jeong et al., 2010;Wang et al., 2014;An et al., 2016).
This can be archived by examining several map sizes of the trained SOM and to find for which one the TE and QE are having their lowest values (Wang et al., 2014), where this procedure can be graphically represented in the following figure (Figure 3.8) and the optimum map size is the one with the 3X3 map size for the specific SOM application that is described in the study of Tsai et al. (2017). The TE and QE are decreasing gradually as the map size of the trained SOM is increasing (see Figure 3.9) and the optimum size is based on local minimum values for the parameters TE and QE, however the heuristic rule as suggested by Vesanto and Alhoniemi The input data before being presented to the SOM network must be pre-processed, because different magnitude of the water quality variables may result to unequal importance for the weight update procedure, and by that way the SOM model performance is improved . Furthrmore, all the input variables are treated by the SOM in the same way (Garcia and Gonzalez, 2004). In their study Zhang et al. (2009) are emphasizing that the SOM modelling results might be affected by an inappropriate pre-processing method and they propose a way to resolve this problem based on the Shapiro-Wilk normality test in order to find of the measured variables were following normal distributions.
Several data pre-processing (transformation) methods for the SOM input parameters are found in the literature (e.g. Garcia and Gonzalez, 2004;Peeters et al., 2007;Zhang et al., 2009 and are presented as below:  The logarithmic normalization: where x tr symbolizes the transformed data, x symbolizes the initial data, α is a real number.  The zero-mean (z-score) normalization: where x tr symbolizes the standardized data, x symbolizes the initial data, xmean the mean deviation if x, σx the standard deviation of x.
 The Min-Max normalization: where x tr is the transformed data x, min(x) the minimum value of x, max(x) the maximum value of x.
There is no particular constrain for transforming the SOM's input variable data for any of the mentioned normalizations above, since as it stated by Giraudel and Lek (2001) that in opposition with the correspondence analysis, it is possible to have negative numbers in the data sample and a standardization of the measured parameters can be made before presenting the input variables to the SOM model.

U-matrix and Component Planes
The The darker the shade the largest is the Euclidean distance between the neurons of the SOM map, therefore a light shade between the neurons of the SOM map is indicating a small Euclidean distance (Vesanto et al., 1999;Recknagel et al., 2006b). The darkest shades on the U-matrix are representing biggest distances between neighboring data and therefore the borders between the clusters are defined (Recknagel et al., 2006a, Farsadnia et al., 2014, and by that way it is possible to achieve the identification and visualization on the U-matrix for the several clusters of the input environmental data (Scholz, 2011). The CPs are used in order to visualize the value of the SOM's input variables in each map unit 2006a). The CPs are considered as a good tool to evaluate the interrelationships between the several water quality parameters Hadjisolomou et al., 2018). These interrelationships between the SOM input variables can be observed by examining the Figure 3.11, which is representing the CPs of several surface water quality parameters measured in the area of Hong Kong.
Through the CPs the exploratory potential among the SOM's input variables is provided and therefore the CPs visualization are making the SOM neural network to be characterized as a useful tool in exploratory data analysis and clustering of multivariate data sets (Peeters et al., 2007;Hadjisolomou et al., 2018), while the visualization of inter-specific association among the CPs is supporting the SOMs role as a useful tool in exploratory data analysis  The CPs mapping interpretation is based on degrading color shadings. Each CP is associating the value range of every input variable of the trained SOM with each map unit (Vesanto et al., 1999). The SOM is a technique used in data analysis clustering for exploring, in order to discern any reasonable relationships among the environmental variables and in most modelling cases without prior knowledge or assumptions of the given data set (Bieroza et al., 2009). Therefore, the term "correlation hunting" is given by Vesanto (1999) that states that the CPs can also be used for correlation hunting among the SOM's input variables. The possibility to have even local correlations might exist, when two CPs are resembling each other only to some regions . So, the term correlation is not limited only to just linear correlation, but also includes nonlinear correlation and local (partial) correlation between the SOM's input variables (Vesanto and Ahola, 1999).
The SOM can identify the corresponding correlations and similarities between the water quality variables (Dong et al., 2012). When there is strong positive correlation between two water quality variables, then their associated CPs are very identical. When there is strong negative correlation between two water quality variables, then their associated CPs are identical but with reverse degrading of color shading Hadjisolomou et al., 2018). For example, in the above figure (Figure 3.11) by examining the CPs of the Total Phosphorus parameter and the Chemical Oxygen Demand parameter it is observed that there is strong positive correlation between these two water quality parameters, since the CPs are very similar and the dark scale coloring (red) areas are following an identical degradation towards the light scale coloring (blue) areas.
The spherical/complete implementation of the SOM results requires some visualization and clustering tools to be used, in order to have substantial information on the input data distribution and interrelationships among the SOM's input variables (Bieroza et al., 2009).
Besides the U-matrix and the CPs, the sample distribution map (see Figure 3.12) and the hit histogram (see Figure 3.13) are also used for implementing the results of the trained SOM.
With the use of the sample distribution map for each map node the most frequent BMU with an assigned data set index (e.g. sampling site number) is associated (Bieroza et al., 2009;Zhang et al., 2015), meaning that with the help of the sample distribution map the sampling sites (or every other possible assigned index) are distributed and clustered analogously. symbol is for polluted (taken by Zhang et al., 2015).
By that way the sampling sites can be related with the CPs maps and to make associations about the environmental conditions that are characterizing each sampling site, like their environmental parameters value levels (therefore to evaluate the behavior and interactions of the water quality parameters for that specific sampling site), to exam their grouping and similarity/ or dissimilarity (e.g. comparison of mean, maximum, minimum values) with other sampling sites and try to find possible reasons/events that are involved for this. The hit histogram of the trained SOM is displaying the distribution of sample on the SOM map (Bieroza et al., 2009;Zhang et al., 2015, Hadjisolomou et al., 2018. Specifically, for Figure 3.13 (a) the hit histogram of the trained SOM is displaying the distribution of the number of sample hits for each cell (node). This hit histogram is illustrating how many times each prototype neuron is the winning neuron for the data set of the water quality variables and the neurons with higher number of hits are representing more data samples with similar water quality properties . Regarding the Figure3.13 (b), the hit histogram of the trained SOM is displaying the distinction between the data set samples (e.g. polluted or unpolluted water samples) and the distribution of the density of sample hits for each cell (node), where the more samples hits for each cell the more intense (having larger surface) is the color inside it.
In conclusion it can be said that this visualization of the hit histogram of SOM analysis is displaying the density of the SOM hits based on a distinction of the data sample. According to Bieroza et al., (2009)  When referring about the sample distribution, then each neuron of the SOM map is giving the assigned name of a sampling site (or every other possible assigned index) of the most frequent best matching sample and is representing many data samples that are assigned to this winning neuron; while in the case of the hit histogram then is given the number of those data samples that are associated with the winning neuron .

The Cluster Structuring Index
The influence of each input variable on the organization of the SOM map is measured with the use of an index, called the Cluster Structuring Index (CSI or SI). The SI was initially created to asses which of the measured diatom species were having the strongest influence on the organization of the SOM map (Park et al., 2005;Park et al., 2006). As stated by Park et al. (2005) the idea behind the SI relied upon the observation that when a cluster defined in the SOM is an assembly of neighboring units of the SOM, and therefore when a variable is specific to a certain cluster and is having smaller difference between the weights of SOM units. The SI for the i-th variable can be calculated based on the formula given below Park et al., 2006): ammonia nitrogen (NH3-N). The SOM managed to produce meaningful patterns among the 25 fish species and the water quality parameters for the 276 heterogeneous data sets measured across Taiwan, while the SI regarding the water quality parameters was evaluated (for more see Fig. 3.14). the regression determination coefficient (R 2 ) between SI values of the new reduced data set and the initial data set is calculated.
In the study of Park et al. (2006) the problem of too many input variables of a SOM is resolved in a similar way, where the SI was used to create several reduced data classes from the original data set and the created classes with small SI were excluded (see Fig. 3.15). Then the new created data class is chosen based on the examination of the sum of the Euclidean distances of the SI values of the original data set and the reduced data set. After the the calculation of the distances, the reduced data set with the smallest sum value is chosen. It was then observed that the profile of the distances (see Fig. 3.16) is creating a criterion for choosing the SI value associating with a reduced number of input variables and for which the loss of significant quantities of ecological information is minimized. The chosen reduced data set had the higher regression determination coefficient among the examined reduced data sets.
The examination if the ecological information is preserved for the new reduced data set was the same as the previous example described by Tison et al. (2005), while the samples of the new data class are used to train the SOM.  The first method for clustering of the trained SOM prototypes is based on the Umatrix's observation, meaning that clusters are detected after visual inspection is performed on the U-matrix . This procedure of clustering is described as in   (2000)).
The application of the k-means algorithm for clustering the SOM's prototypes is a relatively easy task, widely used in SOM water quality modelling by many researchers e.g. (2004) Kim and An (2015); Li et al. (2017). Initially the SOM neural network is trained and then the k-means clustering algorithm -an iterative clustering algorithm that divides a given data set into a number of k clusters-, is applied on the nodes of the trained SOM. The k-means algorithm is partitioning the input data into a specified number of clusters based on the Umatrix (Recknagel et al., 2006b). It must be noted that, when the number of clusters is increased, then the algorithm is sensitive to outliers because the number of samples in each cluster is decreasing .

Garcia and Gonzalez
Meanwhile with the usage of the Davies-Bouldin index it is calculated the optimal number of the k groups (clusters) for the prototypes (nodes) of the trained SOM Farsadnia et al., 2014). The Davies-Bouldin index -for analytical information see the original study of Davies and Bouldin (1979)-is having its minimum value for the optimal number of clusters of the trained SOM ). The procedure for finding the optimal number of clusters based on the Davies-Bouldin index is graphically illustrated as in the Figure 3.19. After calculating the Davies-Bouldin clustering index, based on which the optimal number of clusters of the SOM is calculated and symbolized with n, then the n-cluster structure of the SOM map can be described. For example the SOM clustering that is graphically illustrated as in Figure 3.20, is showing how is done the division/clustering of the SOM map for the water quality data in a n-cluster structure, based on the procedure described previously by the Figure 3.19. The study of , which is examining with the use of a SOM model the surface water quality of 4 watersheds in Turkey, in order to investigate the relationships between the water quality variables, is analytically describing the association of Figure 3.19 and Figure 3.20. In that study the Davies-Bouldin Index was found to be seven and therefore these 4 watersheds were clustered in 7 clusters in the SOM map, allowing the authors to extract information about the variables' associations in each group, e.g. for group 1 the high water temperature was associated with low rainfall and high Total Dissolved Solids concentrations.

Application of the SOM neural network for prediction
The SOM neural network is mainly used as a data clustering methodology and only few researchers have applied the SOM for prediction, in contrast with the supervised ANNs that are mainly used for prediction. The SOM is flexible to predict time series and because of its visualization abilities allow the user to understand the dynamics of the underlying process in a better way than the MLP neural networks (Barreto, 2007). According to Simon et al. (2003) the process for selecting the best topology of the SOM model when it is used for time series prediction is not really simple in practice, therefore they have proposed a methodology for selecting the best SOM neural network topology based on a normalized error (NSSE) criterion as given by the equation below (Eq. 3.16):  The SOM neural network that can be used for time series prediction can be also successfully applied in the field ec0ohydrological sciences for the prediction of variables that are not cost-efficient (expensive to measure). For example, in their study Zhang et al. (2008) simulated the nutrient removal performance in integrated constructed wetlands by applying a SOM neural network. The SOM model managed to predict with good accuracy the ammonianitrogen and SRP outflow concentrations with the use of variables (WT; EC; DO) that can be measured less cost-effective.

The SOM Toolbox for Matlab developed by the Laboratory of Information and
Computer Science at Helsinki University of Technology (Vesanto et al., 1999) is the main tool that is used in the majority of SOM modelling studies in the field of eco-hydrology e.g. Lee In conclusion, the SOM neural network has been proved to be an efficient way to evaluate the complex relationships between water quality variables. The SOM model is characterized as an excellent practical tool, which is good in prediction and that is not affected by possible computational barriers like missing values and incomplete data sets (Zhang et al, 2009). Therefore, it had started to be used as a management tool in order to solve several water quality problems like agricultural, industrial and municipal water pollution ). Besides SOM's ability to model complex non-linear relationships more reliable than other modelling methodologies, the advanced visualization abilities of the SOM neural network are making to be a superior modelling tool when comparing with other classical modelling techniques Hadjisolomou et al., 2018).

References
Al Also, some basics of statistical theory necessary for the needs of many modelling studies are given. The way that these methodologies are combined with ANNs in order to resolve some modelling issues is also examined. Finally, ANN models are compared against these broadly used modelling methodologies applied for water quality modelling, in order to evaluate/compare them as modelling tools. The MLR models are also widely used in the field of water quality research, where they are managing to represent with success difficult modelling tasks. For that reason, several studies were carried out using MLR examining water quality related problems. Such study is the one carried out by Chu et al. (2013), in which the MLR model was applied in order to examine the relationship between the water quality parameters and the land cover changes after typhoon events in a watershed obtained by satellite images. In the study of Zhang et al.

Applications of Statistical Techniques in water quality modelling
(2012) MLR methods are used for associating the impact of agricultural land use intensity on the water quality of five Beijing's mountainous watersheds.

Correlation Coefficient
The Pearson correlation symbolized as correlation coefficient (r) is the most widely used measure to associate two variables for continues types data and provides a measure of the linear association of two variables (Marques de Sa, 2007) and is calculated based on the type: where x is the mean of variable x; y is the mean of variable y; n the number of measured samples. When the correlation coefficient equals +1 then it is indicated that there is a perfect sympathetic linear relationship between the two variables, while when the correlation coefficient equals -1 the there is a linear antipathetic relationship between the two variables (Marques de Sa, 2007;. Prior calculating Pearson correlation coefficient, it is recommended that the data to be transformed, because the Pearson method is very sensitive to data outliers and deviations from the main structure of the data . Another widely used type of correlation coefficient is the Spearman's correlation coefficient and is symbolized as rs. The formula that gives the Spearman's correlation coefficient is having the follow form (Mukaka, 2012): where di 2 is the difference in ranks for variable x and variable y; n is the observations number.
The alternative form of the Spearman's correlation coefficient is as bellow (Millard & Neerchal, 2000): where the Rx is the rank for variable x and Ry rank for variable y; n is the observations number.
In contrast with the Pearson correlation coefficient the Spearman's correlation coefficient measures any kind of monotonic relationship between x and y and is not sensitive to the outliers (Mukaka, 2012;Millard & Neerchal, 2000).

Principal Components Analysis
Principal Components Analysis (PCA) is a multivariate statistical technique that is used to reduce the number of variables of a data set to a smaller number of variables without information loss of the initial data set (Hou et al., 2015). A new set of variables called principal components are obtained from applying PCA on the data set. New uncorrelated (orthogonal) variables are created after performing VARIMAX rotation, which they are linear combination of the initial variables (Chen et al., 2016) and eigenvalues and eigenvectors are extracted from the covariance matrix of the initial data set. Prior applying PCA it is recommended the data to be standardized in order to be reflected in a uniform measurement system across all variables (Hou et al., 2015 where Fj symbolizes the component score; xj is the jth measured value of a variable; p is the total number of variables; ajp is the loading of the pth variable in the jth factor; j=1, 2, …., p. Some rules suggesting the number of how many principal components to retain are given as below (Jakson, 1993;Rencher, 2002;Marques de Sa, 2007): 1. Select the principal components that explain a certain percentage of the total variance (say, 80%).
2. The Guttman-Kaiser criterion retains these principal components that their eigenvalues are greater than the average of the eigenvalues (over 1 for standardised data).

The scree test uses the scree-plot (graph of the eigenvalues) and search for a break
between the large and small eigenvalues. where λi is the eigenvalue of the ith principal component.

Factor Analysis
Factor Analysis is multivariate statistical method used to find a small number of factors from a data set of many correlated variables and the dimensionality of the initial data set can be reduced with the application of factor analysis (Astel et al., 2004). Factor Analysis differs from PCA because Factor Analysis imposes a strict structure of a fixed number of common factors, while PCA determines p factors in a decreasing order of importance . Also, a major difference between PCA and Factor Analysis is based on the fact where the εi is the error term; the coefficients λij are called loadings and served as weights, showing how each variable depends from the loading.
Generally, the number of extracted factors must be less than the number of measured variables (Astel et al., 2004). The three most used selection methods for the number of factors to be extracted are very similar to those used for PCA for choosing the number of principal components to retain and they are described by Brown (2006) as follow: 1. The Kaiser-Guttman rule: also known as the Kaiser criterion or the eigenvalues greater than one rule, which uses eigenvalues derived from the input correlation matrix.
2. The scree test: is based on the inspection of a graph of the eigenvalues (x axis) and the factors (y axis) and to determine where the graph changes slope.
3. Parallel Analysis: is based on a scree plot of the eigenvalues derived from the sample data against eigenvalues derived from factoring a random number data set.
After the factors number is decided, the extracted factors are rotated, aiming with rotation to obtain factors with high correlation for some variables and no correlation with the others (Basilevsky, 1994), and providing by that way a more informative interpretation of the factors . Several rotation methods exist like the Varimax, which is an orthogonal rotation and results in orthogonal rotated factors and the Promax; Oblimin; Quartimim methods which are oblique rotation methods, meaning the rotated factors aren't orthogonal .
The process of Factor Analysis can be summarized into the following steps described by

Cluster analysis
Cluster Analysis is a multivariate statistical method that is classifying parameters into clusters (groups) based on their similarity and this classification is done without any prior assumptions regarding the structure or underlying patterns of the original data set (Kura et al., 2013). Cluster Analysis is referred as classification, pattern recognition (unsupervised learning) and numerical taxonomy (Rencher, 2002). Objects are clustered based on their similarity. An index of similarity between each pair of observations is used by many clustering methods. The degree of similarity among objects is measured with the similarity (proximity) or distance matrices and high levels of similarity among objects are having large value in the similarity matrix and small value in the distance matrix (Gore, 2000). The dissimilarity measure is usually calculated based on the squared Euclidean distance, although some other dissimilarity measures exist like the city-block distance (or Manhattan metric); Chebychev distance; Angular distance; Canberra or Mahalonobis distance (Sarstedt & Mooi, 2004).
Cluster methods are divided into four categories: hierarchical methods; partitioning methods; overlapping cluster procedures; ordination techniques Gore, 2000). The two most widely used types of clustering methods are the hierarchical (or nested) and the partitioning (or unnested) methods. In contrast to hierarchical algorithms, the partitioning methods have the number of their clusters pre-determined, meaning known a priori .The main difference between hierarchical clustering and portioning clustering is that with the hierarchical clustering techniques once the objects are assigned to the groups this assignment cannot be changed, while with the partitioning techniques the assignment of objects may change during the algorithm application .
The most famous partitioning algorithm is the K-means algorithm. The main idea of the k-means algorithm is to minimise the average of the squared distances between the observations and their cluster centres (centroids). This process is starting with the algorithm having assigned the observations to their closest centroid, recomputes the cluster centroids and iteratively reallocates the data points to the closest centroid . The Kmeans algorithm is using the within-clusters variation as a measure to form homogenous clusters, in such way that the within-cluster variation to be minimized ). The steps being followed by the K-means algorithm procedure are synopsized as in Table 4.1. The hierarchical clustering methods are divided into the well-known agglomerative methods and divisive (or splitting) methods. At the beginning of the clustering process the hierarchical agglomerative methods treat every object as a unique cluster, while the hierarchical divisive methods as a single cluster and proceed by establishing smaller clusters (Gore, 2000). The agglomerative hierarchical methods begin with n items and end with a single cluster containing all n items. The divisive hierarchical method starts with a single cluster containing n objects and by dividing the resulting subgroups at the end n clusters are formed with one object each (Rencher, 2002) as represented in the figure (Fig. 4.1). The Agglomerative Hierarchical clustering techniques start from an object that forms a cluster and compute the distance matrix for the clusters and then join the clusters that have the smallest distance, with this step to be repeated until all objects are united in one cluster . The steps being followed by the Agglomerative Hierarchical clustering methods are synopsized below in Table 4.2.
Several different methods exist to link two clusters during the procedure being followed by the agglomerative hierarchical algorithm . These different agglomerative hierarchical procedures are measuring the distance from a newly formed cluster to a certain object belonging to another cluster with a different way. The most popular of these agglomerative hierarchical procedures are described as below (Rencher, 2002;Tan et al., 2006 ): where d ( y i , y j ) is usually the Euclidean distance; yi object of cluster A; yj object of cluster B; nAnB the distances between the nA points in cluster A and the nB points in cluster B.
 Centroid: The distance between the two clusters is the distance between the centroids (mean vectors) of the two clusters (see Equation 4.12). Eq. 4.13) and the y A is the mean vector for the observation vectors in A; y B is the mean vector for the observation vectors in B; the nA points in cluster A.
 Ward (incremental sum of of squares method): The Ward method is using the withincluster (squared) distances and the between-cluster (squared) distances. The aim of the Ward method is to unify clusters in a way that the resulting clusters are homogenous as possible. The procedure being followed by the Ward clustering algorithm is to merge the two clusters A and B and to minimize the increase in SSE, symbolized as IAB and defined based on the following equations: Eq. 4.18) and nA, nB, nAB= nA+nB are the numbers of points in cluster A, cluster B and cluster AB. The terms SSEA; SSEB; SSEAB are the within-cluster sum of squares.
A generalization formula describing the different hierarchical clustering methods is proposed by Lance and Williams, (1967) and is having the following form (Equation 4.19): Eq. 4.19) where α, β, γ are coefficients; D is distance function; Cj is the new cluster obtained from the joined of the clusters Ci and Ck . The above formula (Equation 4.19) for different values of the coefficients produces different hierarchical clustering methods as in Table 4.3 , where in the below  The results of the hierarchical clustering methods can be visualized with the use of a graphical representation called dendrogram. In the case of hierarchical method, the groups' number is unknown and the final structure of groups and subgroups is presented with the use of the dendrogram (Sivri et al., 2017). A dendrogram can display the observations, the sequence of clusters and the distances between the clusters and large distances indicate the clustering of heterogeneous groups . Because the hierarchical clustering methods proceed until only one cluster to remain, the number of how many clusters to retain is not restrictive (Gore, 2000). As stated by Moskalik et al., (2014), the determination of the number of the clusters is a tricky problem and for that purpose several dozens of automated methods exist, however in the case that there are environmental pointers to the numbers of features investigated then they will be equal to the number of the clusters.
A widely used methodology for determining the number of clusters for the hierarchical clustering methods is calculating the Sneath's index of cluster significance. The Sneath's index of cluster significance gives the clusters number with the use of a cutoff line for the dendrogram that is created after using a hierarchical clustering method. The stringent criterion of Sneath's index of cluster significance equals to 2/3 of Dmax and the less restrictive criterion of Sneath's index of cluster significance equals to 1/3 of Dmax Kosiba et al., 2011;Oszust et al., 2014), where Dmax is the maximum value of the bond distance D (usually the Euclidean).

Multiple Linear Regression
Multiple Linear Regression (MLR) models are predicting a dependent variable from a set of independent variables. The mathematical equation (Equation 4.20) that describes the general MLR model with Y the dependent variable and X1, X2,…..,Xp the independent variables is given as below (Weiseberg, 2005): where the symbol X in E(Y/X) means that we are conditioning on all the terms on the right site of the equation and the terms βi are constant parameters.
Despite the simple form of the MLR model, the multipredictor linear regression model is flexible and represents realistic how the average value of the outcome varies systematically in accordance with the predictors (Vittinghoff et al., 2005). An important drawback of the MLR models is that they can only simulate linear relationships, but they are widely used because of their simple structure with relatively few parameters (Gan et al., 2004).
The MLR models are widely used in water quality modelling and for the prediction of Chl-a levels. For the Chl-a levels prediction several modelling studies exist with good modelling results like the one of Cho et al., (2009). Some researchers applied more sophisticated modelling techniques for the application of the MLR model, like the combination of PCA and the MLR model method. The purpose of that technique is to simplify the complexity of the relations between the independent variables. In that case before applying the MLR method PCA was performed for the independent variables. In such an example Camdevyren et al. (2005) predicted the Chl-a using 76,1% of the variation of biotic and abiotic water quality parameters and with predictive success to be R 2 = 90.8%, reducing by that way the MLR model dimensions.

Studies combining ANN models with other modelling methodologies
ANNs are a powerful modelling tool, however sometimes the combination of ANNs with some other modelling methodologies, e.g. PCA, can provide a solution for some modelling issues that might appear. For example, the problem of computational complexity might result into the ANN to not perform well enough, therefore the ANN model's dimensionality must be reduced by removing any possible variables that increase the computational error of the ANN.
The phenomenon of spatial heterogeneity appears very often when the data set is taken from a large geographical area. In that case the ANN model might not be able to simulate well enough the whole data set, resulting to big simulation error for some inputs.
Another issue that might appear in environmental modelling studies is the phenomenon of collinearity between the variables. As it is stated by Mela and Kopalle (2002) collinearity might reduce the parameter estimates and that positive and negative correlation structures have an asymmetric effect on variable omission bias. One of the most serious modelling considerations is the omitted variables problem and it occurs when omitted variables affect the relationship between the dependent variable and included explanatory variables (Leightner and Inoue, 2012). Some proposed solutions for the problems mentioned above are found in the literature. The way how ANNs water quality modelling studies and the methodology that was used in order to resolve these problems are discussed below.
A very common tactic applied in ANN water quality modelling is to use PCA on the environmental parameters and to remove the less contributing ones. By that way dimension reduction is archived and the ANN model complexity is reduced. For example,  in their study performed PCA on data set of 9 water quality parameters (SRP; DIN; CO 3; HCO3; pH; EC; WT; DO; SD) that are associated with Chl-a production. The PCA indicated that the carbonates were parameters of no importance and they mere not introduced to the constructed ANN modelling Chl-a levels. A second ANN model that included the CO 3; HCO3 as input parameters verified that the carbonates parameters were increasing computational complexity and produced output values with less accuracy (see Table 4.4). showed that the Chl-a levels were estimated with better accuracy when two PCs (summarizing a total of six of the initial parameters) were used as inputs, comparing with the results of simulated Chl-a when all the environmental parameters are used as inputs.
The problem of spatial heterogeneity might appear when modelling data sets from large lakes. In that case the created ANN might be inadequate to produce reliable results. In their study Huang et al. (2015) address the problem of spatial heterogeneity when modelling the Ch-a in the large lake Poyang -China-by applying the k-means algorithm. The 17 sampling sites were clustered into two clusters. Afterwards three different ANN models were constructed, two had as input variables data from only one of the resulted clusters and the other one ANN had as inputs data for the whole data set, where this process is explained analytically as in the graph below (Figure 4.3). The ANN models that studied the data for each cluster separately had better results than the ANN model that simulated Ch-a for the whole data set.
The phenomenon of collinearity between variables is problem that might appear during environmental modelling with ANNs. ANNs hybrid models (Fig. 4.4) are solving the problem of collinearity, because they are able to establish the relationship between a dependent variable (y) and the independent variables (xi) with the advantage of variable reduction and zero correlation between the score values, avoiding by that way the problem of collinearity (Ul-Saufie et al., 2013).  In their study Zhang et al. (2010) used a hybrid method that is using an ANN and principal component regression (PCR) in order to predict Chl-a and the basics of the procedure that was followed in that study are described below. where the term̂J is the forecasted residual by ANN.

8 ANN models vs other modelling methodologies
The modelling techniques analyzed above in most cases are producing good modelling results and for that reason are widely used. However, they are not suitable for some data sets and for that reason they can not reproduce reliable results. In that case it is observed that the ANNs can be applied and to give well simulated outputs. This is related to the fact that the ANN is advanced modelling technique with superior modelling abilities. Many modelling studies have observed the superiority of ANN models against other classical modelling techniques. In contrast with MLR models, the ANN models are not providing simple equations, but they also can easily quantify the contribution of each variable (Lek et al., 1996).
Some representative examples of water quality modelling studies that are comparing the results of ANNs against the classical modelling techniques are given below.
For example, in their water quality modelling study Ozel et al. (2017)  and the ANN models had better performance comparing with the MLR model (see Figure   4.5).
ANN and MLR methodologies and it was found that the ANN models had better performance than the the MLR models (e.g. Karul et al., 1999;Karul and Soyupak, 2003;Malek et al., 2011;Karakaya et al., 2013;Rajaee and Boroumand, 2015). Some exemptions are cited in the literature, where the MLR models have better performance than the ANN models when simulating the Chl-a parameter. For example, in their study Merdun and Cinar (2010) simulated the Chl-a parameter with ANN and MLR models and found that the MLR performed slightly better than the ANN model, however it was noticed that the ANOVA test results showed that the difference was not statistically significant (p<0.05).
The superiority of the ANN models against MLR models is based on their inherent nonlinearity and inter-connectivity of ANN together with its ability to learn and generalize information from complex or poorly understood systems (Gan et al., 2004). Besides the fact that the ANN models are better predictors than the MLR models, they are having the advantage of normality for their residuals and their independence from the variable to be predicted and the ability to produce good results with non-transformed data, in contrast with MLR models (Lek et al., 1996).
The phenomenon of dendrogram overcrowding is observed when dendrograms may grow to be so overloaded that leafs become overcrowded, while this drawback of dendrograms is increasing with sample size and may cause a problem of dendrogram readability (Kontaxaki et al.;2010). In their study Sanchez-Martos et al. (2002) are using SOM model and cluster analysis in order to model the groundwater quality based on data set of 168 observations and 9 variables. During the cluster analysis application on the raw (initial) data set the phenomenon of dendrogram overcrowding is observed (see Figure 4.6), where it is difficult to identify groups containing samples that correspond to similar water quality. This problem is attributed to the large number of sampling points in the data set.
In contrast the SOM modelling procedure managed to represent each sampling point with an activation map of 9x9 neurons and the resulted dendrogram (see Figure 4.7) to be easily implemented. Therefore, the SOM method is producing results that are more easily to be implemented with the use of a dendrogram, comparing with the hierarchical clustering methodologies. According to  both the cluster analysis algorithms and the SOM methodologies have the advantage of visualization ability, but the SOM has three more additional advantages: I. SOM visualization is having the ability to present similarity between positive correlated variables but is also able to present similarity between negative correlated variables.
II. SOM visualization is able to detect outliers, data samples that don belong to a wellorganized group.
III. The projection of variables similarity contains semi-quantitative information about the Modelling studies that are using both PCA and the SOM methods are usually producing similar results (e.g. Samecka-Cymerman et al., 2007;An et al., 2016). However, it is observed that the SOM ability to classify environmental samples is more advanced than the PCA ability.
In the study of  the SOM and PCA were used in order to classify groundwater monitoring data in South Korea. Because the groundwater samples were widely dispersed, it was noticed that the PCA results were not satisfactory to classify groundwater and the loading values of the variables did not provide any meaningful information. However, the SOM managed to relate the groundwater samples with the variables and also to classify the groundwater samples into different groups.
In contrast with the traditional statistical methodologies that are not able to present well the information of large data sets, the SOM is suitable for visualization of high-dimensional data, because SOM is converting complex statistical relationships between data sets into simple geometric relationships on a low-dimensional display (Varbiro et al., 2007). According to  the outstanding performance of the SOM model comparing with other modelling techniques is related with the SOM's potential of clustering and classifying data, but also the SOM is ideal for detecting outliers because they are displaying in particular parts of the map without affecting the remaining parts and each outlier takes its place in one unit of the map.
Some other advantages of the SOM that are making it a superior modelling method comparing with the traditional statistical methodologies are: the ability to handle missing data; dealing with nonlinearities of the system; the SOM can be developed from data without requiring a mechanistic knowledge of the system; easily and quickly updated; noise reduction or handling irregular data; SOM provides an advanced visualization method . The SOM allows the visualization of association, while the sample units and the variables concentration can be displayed in the same figure making data interpretation easier . Based on these mentioned above the superiority of SOM is obvious, when comparing with other statistical methods.

Application 1: Assessing the Contribution of the Environmental Parameters to
Eutrophication with the Use of the "PaD" and "PaD2" Methods in a Hypereutrophic Lake

1. Introduction
Over the past few decades eutrophication has emerged as a serious problem affecting the water quality of many lakes worldwide, mainly as a result of increased nutrient loadings related to human activities (Ferreira et al., 2007). Eutrophication is related with phytoplankton overproduction and has as a common proxy: the chlorophyll-a (Chl-a) parameter. Eutrophic conditions in a water column may lead to a Harmful Algal Bloom (HAB) event (Dyhrman, 2008). Three different categories of HABs can be distinguished: toxic algae; potentially toxic algae; and high-biomass blooms, which are called "red tides" (Ferreira et al, 2011).
In the case of toxic HABs events, toxin-producing algal species are involved. In freshwater lakes most of these toxic species are cyanobacteria, formally called blue-green algae (Ha and Pflugmacher, 2013). Because of their toxin production, cyanobacteria have been characterized as potential key hazardous pollutants, by the European Water Framework Directive (2000) (2000/60/EC) (Hoeger et al., 2005). Aquatic organisms may suffer from poisoning because of cyanotoxins released from cyanobacteria when a HAB occurs . The role of cyanotoxins in recreational waters and their effect on human health have started to be under examination the recent years (Koreiviene et al., 2014). It is reported that about 60,000 intoxication incidents, with an overall mortality rate 1.5%, take place per year globally because of algal toxins (Ferrante et al., 2013).
High biomass blooms are associated with hypoxia and anoxia, with partial or even total oxygen depletion in the water (Gubelit and Berezina, 2010). In hypoxic waters aquatic ecosystems and public health is under serious threat. Besides the negative impact on public health and aquatic organisms, high biomass blooms are responsible for a series of other problems affecting water quality, like bad odor and algal scum formation .
The economic cost of these eutrophication related problems is huge. Understanding the links between Chl-a and the associated environmental parameters is the first critical step for eutrophication management. The linkage between eutrophication and environmental parameters is often non-linear and complex (Paerl, 2006), so the simple linear structure of a regression model cannot be applied and other mathematical methods are more suitable for lake modelling. Artificial Neural Networks (ANNs) have proven to be a powerful tool in lake eutrophication assessment (Xu et al., 2001), because of their ability to model phenomena involving non-linear and complex data (Lek and Guegan, 1999).
The aim of this study was the creation of an ANN model that would simulate the Chl-a and the examination of the impact of each environmental parameter associated with eutrophication. The necessity for evaluating in detail the role of each environmental parameter arose because despite several restoration attempts, Lake Pamvotis remains a hypereutrophic lake. The contribution of each parameter was calculated with the use of the Partial Derivatives ("PaD") method. The coupled effect of the parameters were calculated with the use of the "PaD2" method, in an attempt to have a more thorough look into eutrophication process and to examine the synergistic effects between the environmental parameters. To our knowledge this is the first modelling study where the "PaD2" algorithm is applied in order to examine the synergistic effect of environmental parameters on the simulated Chl-a.
The "Pad2" algorithm is a two-way interaction sensitivity method designed for ANNs that provides information regarding the impact of the several combinations of input parameters to the modeled output. In that way the synergistic effects of the environmental parameters is examined and useful conclusions regarding the trophic production can be extracted. By applying the "Pad2" algorithm the input environmental parameters synergy mechanism is revealed and the environmental parameters' interactions are evaluated by the ANN model.
Therefore this ANN modelling study can be considered an advanced management tool compared with other ANN modelling studies that examine only the one-way sensitivity of environmental parameters. The application of these two ANN model sensitivity methods produced interesting results that can act as an advisory tool for any future restoration attempts.

Study Area and Data Collection
Lake Pamvotis is a shallow Mediterranean lake located in northwestern Greece ( Figure   5.1) with mean depth 4.3 m and maximum depth 7.5 m, raising about 470 m above sea level (Papatheodorou et al., 2006). It occupies an area of about 22.8 km 2 and is located next to the city of Ioannina. Pamvotis Lake has been listed among the Natural Special Conservation Areas under the European Community Council Directive on the conservation of natural habitats and of wild fauna and flora (Habitats Directive, EC, 92/43).
Lake Pamvotis can be consider as a "closed" hydrological system, as it has no natural surface outflows and is recharged by karstic springs (Papatheodorou et al., 2006;. During the last several decades the lake has become more eutrophic due to impacts from agricultural activities, livestock, nearby industrial units, discharges of domestic sewage, irrigation, sediment deposit and introduction of alien fishes (Gkelis et al., 2014). In previous years and particularly in 2005, massive cyanobacterial surface blooms were recorded from early summer to late autumn . More recently, Gkelis et al. (2014) assessed the occurrence of cyanobacteria blooms in Lake Pamvotis and highlighted the co-occurrence of more than one cyanotoxins during the warm period and the potential risk for human health. These findings clearly suggest that the trophic status of the lake remains highly eutrophic despite the management practices that have been applied the last decades.
In an attempt to improve the lake trophic status, in 1986 Lake Pamvotis was stocked with several native fish species and the exotic planktivorus Ctenopharygodon idella. The result of this intervention is debatable, since it led to great decline of submerged vegetation (Kagalou et al., 2003;Stafanidis and Papastergiadou, 2007). In 1992 the operation of the sewage treatment unit and the consequent discharge of the effluents into Lapsista channel began

ANNs Methodologies
Multi-layer feedforward ANNs are a very popular category of ANNs, mainly because they are suitable for function approximation. Feedforward ANNs have acyclic topology with no feedback loops among their layers and are used for approximating non-linear mappings among inputs and outputs (Hu and Hwang, 2002). A feedforward ANN has at least three layers and each layer consists of neurons. The first layer is the input layer that imports the input parameters to the network, one or more hidden layers and the output layer that gives the final result. For every neuron there is synaptic weight that connects it with every neuron of the next layer. The weighted sum of a neuron's inputs produces an output with the use of a transfer function (Kuo et al., 2007). The relationship that gives the ANN output from the jth neuron (oj) as given by Kazemi Yazdi and Scholz (2010) has the form: ( ) j j o f u  (Eq. 5.1) and: where f is a transfer function, xiis the input from it neuron belonging to the immediate previous layer, wij is the synaptic weight that connects xi with the jth neuron and zj a bias term. The number of hidden neurons has a major role in the good performance of an ANN.
With a small number of hidden neurons the ANN will not be able to produce good results.
Meanwhile for a big number of hidden neurons the phenomenon of overfitting occurs (Karul et al., 1999). Overfitting exists when the error for the training set is small, but the ANN has large errors for new data. In that case the ANN hasn't learned to generalize to new situations (Demuth et al., 2007). To avoid overfitting many empirical rules that suggest the maximum number of hidden neurons have been proposed. One such practical rule is the one proposed by   (Eq. 5.4) where N H is the number of hidden layer neurons, N I the number of inputs and N TR the number of training samples. The maximum N H must be the smallest number calculated by those two equations. The determination of the optimal number of hidden neurons and layers is decided after a trial and error procedure (Ozesmi et al, 2006). In order to prevent an ANN's overtraining early stopping and regularization methods are used. In early stopping the data set is divided into three subsets: training set, validation set and test set. The network training stops when the errors for the validation set begin to rise, indicating that the network had begun to overfit the data . The regularization method involves modification of the network performance function that is the sum of squares of the network errors in the training set. By using the modified performance function will cause the network to have smaller weights and biases, and this will force the network response to be smoother and less likely to overfit (Demuth et al., 2007).
The Partial Derivatives ("PaD") method, presented by , is a technique for measuring the ANN's sensitivity regarding its input parameters and is a single parameter interaction method. This method has two parts. As analyzed by  the partial derivatives of the output for small changes of each input are computed and then the relative contribution of each input variable is classified. The sensitivity of the ANN's output for the input variable xi is symbolized as SSDi and according to  is the sum of the squared partial derivatives obtained per input variable and has the following form: (Eq. 5.5) where N is the observations number of the data set; djithe partial derivatives of the output yjwith respect to input xj. The equation for SSDi is given below: where nh the neurons number of the hidden layer; Ihjthe output of the h-th hidden neuron for the j-th input; wihthe weight connecting the i-th input neuron and the h-th hidden neuron; whothe weight connecting the output and the h-th hidden neuron; Sjis the derivative of the output yjwith respect to input xj.
The "PaD2" method examines the synergistic effects between the input parameters and is considered a two-way interaction sensitivity analysis method. The application of the "PaD2" method is very similar with the application of the "PaD" method. According to  the first part of the "PaD2" method that gives the partial derivatives of the output yj with respect to inputs xjiand xj2is given by the following equation: where the symbols were explained above, except sj that is the second derivative of the output neuron with respect to its input van Maanen et al., 2010). The relative contribution of the coupled variables to the ANN is the sum of squared partial derivatives of the coupled variables and is calculated as below: (Eq. 5.8)

ANN Model Development
Using Principal Components Analysis (PCA) the selection of ANN's input parameters is done Zhang et al., 2015). PCA is often combined with ANN modelling, because by that way dimension reduction is enabled and the model's computational complexity is reduced, so the possibility of a model's misconvergence and poor accuracy is eliminated . Based on PCA results the carbonate and bicarbonate parameters were excluded, since they contribute less than 2% to the total variation of the data set. The resulting input parameters are SRP, DIN, pH, EC, WT, DO and SD and the model output is Chl-a. In eutrophication modelling with ANNs it is a common practice the Chl-a to be log transformed and then the data set to be normalized (Scadi and Harding, 1999), as in that way the network performance is improved. After network training the data is set back to the initial form.
The method of regularization was used for avoiding model's overfitting, for the regularization method the data set is divided into training set and test set. Therefore the data set consisting of 161 samples was divided into two subsets. The training set containing 80% of the data and the test set 20%. The ANN's training algorithm was the Levenberg-Marquardt (LM) algorithm, because the LM algorithm has fast learning speed and is the best training algorithm among the rest variations of back-propagation algorithms for medium-sized networks Tota-Maharaj and Scholz, 2012). The sigmoid transfer function having the form: Eq. 5.9) was used as the activation function between the layers and by that way the non-linearity was imported into the model.
With the use of MatLab software, the ANN was simulated several times with different numbers of neurons in the hidden layer to find the optimal number of neurons as proposed by Sener et al. (2012), always keeping in mind the restrictions of the maximum number of neurons as given by the Equations (5.3) and (5.4). The optimal neurons number was 10 neurons for the hidden layer, seven neurons for the input layer and one neuron for the output layer. The resulting ANN optimal topology is thus 7-10-1, based in the form L1-H1-L2, where L1 is the number of neurons in the input layer, H1 the number of neurons in the hidden layer and L2 the number of neurons in output layer The best ANN's model performance was calculated for the test set using the coefficient of determination (R 2 ) and the Absolute Relative Error (RE) since they are commonly used statistical variables for evaluating model performance (Cui et al., 2016). The equations that give R 2 and RE are given respectively as below: (Eq. 5.11) where oi is the observed value; oi is the average observed value; si the simulated value; si the average simulated value and N the observations number. The ANN model performance results for all the data subsets are presented in Table 5.1.

ANN's Simulation Results
One-way sensitivity analysis was carried out in order to examine how strong the effect of each input parameter on the simulated output is. The "PaD" method was preferred among other ANN sensitivity techniques, since the "PaD" method is the most stable . The relative importance of input variables to the ANN was found to have the following order of contribution according the results of the "PaD" method: WT, SRP, SD, pH, DO, EC, DIN (Figure 2). The "PaD" method has the ability to discriminate the input variables into minor and major contributing environmental parameters . In our case the input parameters WT, SRP, SD, pH, DO have a major relative importance, while the input parameters EC, DIN have minor relative importance to the simulated Chl-a. The "PaD2" algorithm was used to calculate the synergistic effect of paired input parameters on the modeled Chl-a. For that purpose 21 combinations of paired input parameters were created. The evaluation of the impact for a small change of these paired parameters to the ANN was done with the application of the "PaD2" algorithm. The relative contribution of these interactions are given in Figure 5.3. The sum of the five most influential paired parameters (WT-SRP, SD-WT, WT-pH, SD-pH, pH-SRP) described up to 70% of the ANN model sensitivity. The graphical representation of the most influential parameters enables us to produce useful results regarding the parameters' interactions. The WT-SRP ( Figure 5.4) is the most influential interaction and has a relative importance of almost 22%. As it is graphically observed the Chl-a gradientsstart to rise for high SRP concentrations combined with high WT. Meanwhile the same high WT values combined with lower SRP levels reveal that Chl-a gradientshave the tendency to decrease. Therefore it is obvious that a decrease in SRP levels would lead to a decrease to Chla values, even during the hot months that are associated with high algal production . For elevated pH and high WT there is a tendency for the Chl-a gradientto rise.In that way the high pH levels observed related with increased algal activity and high WT that stimulates algal production, are associated. This is in agreement with a study of  which noted that high Chl-a values are linked with high pH, especially during summer. The SD-pH combination ( Figure 5.7) gives negative Chl-a derivatives. This behavior can be explained by the fact that small increase of SD values due to application of the "PaD2" algorithm results in contrary behavior regarding Chl-a production. The SD is negatively correlated with Chl-a levels (LaBounty, 2008). The pH-SRP combination ( Figure 5.8) although it follows no specific pattern, shows that for high SRP and pH levels the Chl-a gradients take their highest values and for lower SRP levels the derivatives have smaller values, so the ANN managed to correctly connect the SRP effect on Chl-a and to associate it with high pH because of increased algal production.     The synergistic effect of the SRP-DIN follows a clear pattern regarding the Chl-a gradient. For high SRP and DIN values the Chl-a derivative has the bigger increase and as the SRP and DIN values decrease the Chl-a derivative tends to decrease accordingly. The reduction of DIN or SRP gives a reduction for Chl-a and the combined reduction of nutrients yields to even higher reduction (Sagrario et al.,2005), something that is verified by the "PaD2" algorithm findings.

Implications for Management and Restoration
This ANN modelling study was aimed at evaluating the influence of each environmental parameter on the simulated Chl-a. A need for careful examination of the environmental parameters resulted, since it was observed that despite several restoration measures that were applied the last decades to Lake Pamvotis, no lasting water quality improvement was achieved. The impact magnitude of the parameters and the way that they interact with other parameters regarding algal production needed to be investigated. For that reason the "PaD" and the"PaD2" algorithms were used, producing interesting findings about the parameters.
An ANN model was chosen for this modelling study instead of any other modelling technique because of the ANNs' ability to model complex systems, making them ideal for lake modelling. The trophic function of a lake can be modeled with the use of an ANN usually with a good correlation, as in this modelling study. Therefore the created ANN can be considered as a reliable predictor of Chl-a providing mathematical trustworthy relationships between the modeled Chl-a and the environmental parameters. Sensitivity analysis algorithms are the linkages between the Chl-a and the environmental parameters' interactions. Furthermore the"PaD2" method enables us to assess the synergism of coupled parameters on the simulated Chl-a, providing us an advanced management tool comparing with other ANN modelling studies that deal only with one-way parameter sensitivity analysis. Interesting trends between variables synergistic effect and the simulated Chl-a were revealed, like the WT synergism with the SRP that makes restoration efforts more difficult.
Some of the input parameters were found to be more influential than others; however the importance of each input parameter must not be underestimated. All the ANN model's input parameters were found to have an effect on the simulated Chl-a and an omission of any of the input parameters as an input would lead to poor ANN model performance. For that reason it was decided to pay attention to the DIN as well, even though it was found to have small relative importance for the Chl-a. The increased WT related with climatic change increases the risk for a potentially toxic cyanobacterial bloom in lakes with high nitrogen concentration . The high relative importance of the SRP revealed that it is the key nutrient for Lake Pamvotis and has a significant role in the algal production. In a relevant study of Lake Pamvotis Papatheodorou et al. (2006) also found that the key nutrient is the SRP. The combined nutrients "synergistic" effect was pronounced in summer and fall, when algal production is highest (LaBounty, 2008). For that reason lake modelling studies that examine the nutrients reduction effect were carried out, like the one of Mateus et al. (2014).
Our modelling study confirms these results, since the paired SRP-DIN parameter showed that a decrease of nutrients would lead to a Chl-a decrease. The "PaD2" algorithm findings regarding the nutrients behavior could act as a practical management guide tool for decision makers.
Another interesting finding was that the most important parameter was the WT, confirming that way  theory that the effects of climate change for Lake Pamvotis may be greater than those caused by anthropogenic pressure. It is generally accepted that temperature increase leads to more intense eutrophication symptoms (Feuchtmayr et al., 2009;Moss et al., 2011). Keeping in mind the results of the SRP-DIN synergistic effect and the major role of WT, then the necessity for extra nutrient reduction is obvious because the more eutrophic a lake is then the effect of temperature is greater on algal production (Ye et al., 2011). As it is stated by Papastergiadou et al. (2010) Lake Pamvotis is unlikely to switch to clear water conditions and has a tendency for eutrophic conditions, but improvement of water quality can be achieved through specific management practices like nutrient reduction from the catchment area and the sediment, increasing the flushing rate by increasing the karstic springs' discharge by rediverting the springs located on the northern shore into the lake, and by biomanipulation practices to minimize the population of the allochthonous benthivorous species. The Lake Pamvotis natural tendency for eutrophication can be partly explained by the paired SRP-WT parameter computational results. Climatic change favors the phosphate release from anoxic sediments and increases phosphate internal loading, especially during hot months (Mooij et al., 2009). Prolonged and intensified eutrophication might be related with an internal nutrient supply in the water body, even if external input no longer exists (Guneralp and Barlas, 2003). The climatic change and the associated WT increase must be taken into consideration for any restoration measures, since Lake Pamvotis is highly susceptible to increased WT effects.
For Lake Pamvotis other environmental factors that affect eutrophication should also be examined. For example modelling studies that examine the role of fish, macrophytes and zooplankton in lake restoration have been initiated. Models dealing with other restoration techniques like biomanipulation could be developed in order to provide alternative eutrophication management options. Sagehashi et al. (2001) created such a model for a shallow basin that combines as parameters planktivorous fish, three types of zooplankton, two types of algae and nutrients. Based on that model they simulated some restoration methods like biomanipulation, dredging and nutrient loading reduction and the analog effect on water quality. It is suggested as this ANN modelling study of Lake Pamvotis should be updated with new extra environmental parameters that would include long-term monitored data like the ones mentioned above, e.g.., planktivorous fish and zooplankton density.

Conclusions
Lake algal production is considered a very complex issue and the relationships with the associated environmental parameters are difficult to examine. However ANNs are ideal for lake modelling because of their ability to model non-linear situations. A small change of an input variable produces the sensitivity result of the ANN and the way the modeled output responds to that change. With the use of the "PaD" and "PaD2" methods the one-way sensitivity and the two-way sensitivity are calculated accordingly, revealing in that way how Chl-a responds to changes of the environmental variables. The results given by the "PaD" and "PaD2" methods are very important for understanding a lake function and how each environmental variable affects the algal production. Especially for eutrophic and hypertrophic lakes, these findings have the role of advisory restoration tools. For example by using the "PaD" method the key nutrient is found. Also with the use of "PaD2" method it is examined if the restoration measures must follow a one nutrient control policy or combined nutrient reduction. Meanwhile the synergistic effect between the environmental parameters can act as a warning tool for hypertrophic lakes, since the "PaD2" method can reveal if for the hot months and for certain nutrient levels a HAB event might occur.
The ANN managed successfully to simulate the Chl-a levels with a good correlation so the ANN model can be considered as a reliable predictor, based on which the effect of the environmental parameters can be calculated. The ANN's one-way sensitivity analysis was performed with the use of "PaD" algorithm. The most contributing parameters were found to be the WT and SRP parameters. The synergistic effect of the parameters was calculated with the use of the "PaD2" algorithm, a paired sensitivity analysis method. The paired parameters interactions revealed that the SRP-WT combination had the higher relative contribution to the simulated Chl-a. These results showed that the SRP is the lake's key nutrient and that the lake algal production is highly affected by climatic change.
The DIN had no major relative contribution. However the role of DIN must not be underestimated in the restoration attempts, since the ANN results showed that the synergistic effect of SRP-DIN has a clear association pattern with the algal production. Taking all these into consideration, it is indicated that the management measures taken for lake restoration must be updated and new restoration practices should be applied alongside the existing ones.
This ANN model combined with the "PaD" and "PaD2" algorithms has become a successful management guiding tool revealing trends between Chl-a production and the environmental variables.
6. Application 2: Evaluating the contributing environmental parameters associated with eutrophication in a shallow lake by applying Artificial Neural Networks Techniques.

Introduction
Environmental pollution is a serious problem affecting the modern world, therefore several studies were carried out in order to examine the consequences of pollution on the environment and human health like the one conducted by Grigoropoulos et al. (2008). The most examined pollutants used to be the micro elements but in recent studies the toxic metals started to be taken into consideration as well, because of their negative impact on the environment and human health (Aydin et al., 2015). Special emphasis is given on the pollution of lakes because they are the major providers of freshwater world-wide, covering several human needs like drinking and irrigation. It is estimated that lakes contain about 90% of the liquid freshwater on earth's surface (Mishra and Garg, 2011). Surface water eutrophication is considered the most widespread problem to water environment quality around the world (Wu et al., 2011). The causes of eutrophication are closely related to nutrients in-puts, mainly phosphorus and nitrogen, into the water column through anthropogenic activities such as sewage discharge and agriculture (Aydin et al, 2015;Hou et al., 2013;Wang et al., 2013). In many cases eutrophication might cause a Harmful Algal Bloom (HAB) with many unpleasant consequences for a lake, like fish kills or scum formation (Atou et al., 2013). A possible way to prevent HABs is through monitoring. Monitoring of water resources is a way to protect them from pollution and to prevent environmental contamination from pollutants like fertilizers and pesticides (Yagmur et al., 2014).
The negative impact of eutrophication on water quality led to the development of several models trying to simulate it. Artificial Neural Net-works (ANNs) ability to process problems with non linear and complex data, even if the data are imprecise and noisy (Lek and Guegan, 1999), has made the ANNs ideal for lake eutrophication modelling. Another advantage of ANNs regarding Chl-a forecasting is that they can satisfy the modelling objectives by using only routine monitoring data (Li et al., 2010). In our case a time lagged ANN is created for Chl-a simulation. Some examples of ANN models used for fore-casting Chl-a with weekly time-lag are described by ; Palani et al. (2008); Park et al. (2015) while ANN models with monthly time-lag are described by Belgrano et al. (2001); . The aim of this modelling study was the development of a one month ahead forecasting time series model, which would simulate the Chl-a levels. The created ANN model has the ability to predict possible HABs, giving the local authorities the chance to intervene and prevent the HAB event. However, the primary purpose of this modelling study is to evaluate the impact of each environ-mental parameter associated with the eutrophication phenomenon.
With the usage of sensitivity analysis techniques the effect of the environmental para-meters can be measured and so to have an insightful of the eutrophication process. By that way the created ANN model is given the ability to simulate different management scenarios for its parameters, especially for the nutrients, and therefore to produce a guideline for several restoration techniques that could be applied for water quality improvement.

Monitoring Area and Monitored Data.
Pamvotis Lake is located next to the city of Ioannina, in northwestern Greece ( Figure   6.1). More details about Lake Pamvotis are given in Chapter 5.
The used data were from a monitoring study that was carried out for the chronological

Artificial Neural Networks Methodologies.
A very popular ANN is the multi-layer feedforward network. Feedforward networks are usually divided into three layers and each layer is consisted by neurons. The first layer is the input layer, one intermediate hidden layer and the output layer that gives the final result. Each neuron in a layer is connected with all the neurons in the next layer with a synaptic weight.
Aggregation is performed on the weighted inputs of every neuron from the previous layer and an output value is yielded through a transfer function (Recknagel et al., 1997 where f is the transfer function; xi is the input from ith neuron from the previous layer; wij is the synaptic weight that connects xi with the jth neuron and zj a bias term. The output of each neuron is computed and propagated to the next layer and the network output is compared with the given output. The learning procedure is repeated for several times with the use of a training algorithm and each time the synaptic weights are adjusted until they minimize an error function, usually taken the mean square difference between the predicted and the given output (Sengorur et al., 2006;. The methods of crossvalidation and regularization are two approaches widely used to avoid over-fitting phenomenon . Over-fitting occurs when the ANN has big error for new data simulations making the model unable to generalize well new situations.
The topology of a 3 layer ANN can be represented as L1-H1-L2, where L1 is the number of neurons in the input layer, H1 the number of neurons in the hidden layer and L2 the number of neurons in the output layer. To ensure that there aren't too many neurons into the hidden layer and so the ANN starts over-fitting, some rules of thumb are proposed but no assured methods exist until now . In our case the number of the hidden layer neurons is according to the rules proposed by : (Eq. 6. 3) and (Eq. 6.4) where N H is the number of hidden layer neurons, N I the number of inputs and N TR the number of training samples. The maximum N H must be the minimum number calculated by those two equations.
Several methodologies have been proposed for evaluating an ANN model's sensitivity for its input parameters and finding their contribution. In this modelling study three different sensitivity techniques are applied. The first one is the 'Perturb' method. The 'Perturb' method is computing the perturbation effect of the input variables regarding the output variable . The effect that a small change of an input variable has on the ANN's output is examined and the input variables are classified by an order of importance . The 'Weights' method is related with the interpretation of the connections weights from the input layer to the hidden layer . The contribution of each input parameter to the ANN's output is measured with the relative parameter importance (I) equation, which is given by  as bellow: (Eq. 6.5) where nT is the number of time lags; nH is the number of hidden neurons and nV is the number of input parameters.
The 'PaD' method or the Partial Derivatives method was firstly described by Dimopoulos et al. . This method is following two steps. First the partial derivatives of the output for small changes of each input are computed and afterwards the classification of the relative contributions of each input variable ance .
The sensitivity of the ANN's output for the input variable xi, is symbolized as SSDi and according to  is the sum of the squared partial derivatives obtained per input variable and has the following form: (Eq. 6.6) where N is the observations number of the data set; dji the partial derivatives of the output yj with respect to input xj. The equation for dji is given bellow: (Eq. 6.7) where nh the neurons number of the hidden layer; Ihj the output of the h-th hidden neuron for the j-th input; wih the weight connecting the i-th input neuron and the h-th hidden neuron; who the weight connecting the output and the h-th hidden neuron; Sj is the derivative of the output yj with respect to input xj. The ANN's training algorithm was the Levenberg-Marquardt (LM) algorithm, because the LM algorithm has the fastest convergence among the variations of back-propagation algorithm when the data set is up to few hundreds parameters (Hagan et al., 1996)  .
In order to reduce the model's dimensionality and therefore computational complexity Principal Components Analysis (PCA) is applied to the data set. Following the methodology described by Akratos et al. (2008); Zounemat-Kermani (2014) the ANN is simulated with and without the candidate parameters combinations. As proposed by Sener et al. (2012)  Among the tested topologies the best topology was the 8-10-1, with r=0.959 for the test set and r=0.94 for the learning set. The best combination of input parameters was SRP(t-1), DIN(t-1), pH(t-1), EC(t-1), WT(t-1), DO(t-1), SD(t-1) and Chl-a(t-1). The model's output is Chl-at, where t is the time variable measured in months. A synopsis of the model's description is given in Table 6.2. The parameter Chl-a(t-1) was included as an input since Chl-a levels are influenced by their previous values (Palani et al., 2008) and algal production has nonstationary behavior . The introduction of the time lagged parameter Chla(t-1) as a tactic used in many ANN studies simulating Chl-a like the ones of ; Palani et al. (2008); Wu et al. (2014). The PCA revealed that the carbonate parameters have small contribution to the total variation. Additionally, ANN's performance was higher when the carbonates weren't included as an input parameter (Table 6.3). Therefore, the carbonate parameters were excluded as input parameters, since they have no real impact on Chl-a simulations.

Results and Discussion
The graphical representation of the real against the simulated Chl-a values (Figure 6.2) shows that the ANN simulated well the Chl-a with most of the simulated values being very similar with the measured ones, so the created ANN model can be characterized as reliable. In some cases, the measured data that correspond to extreme Chl-a values are not matching very well with the simulated data, increasing by that way the ANN's error. However, the ANN model managed to characterize these extreme data correctly, as eutrophic or hyper-trophic.
The 'Perturb' method was applied in order to examine the effect of each input parameter on the model's output. Each input parameter was increased for +10% and the change of the model output was calculated. The results of the 'Perturb' method ( Figure 6.3) are showing the simulated output tendency for reduction or increase when a specific input parameter is fluctuated. It is simulated that an increase of a parameter's arithmetical value is associated with an increase of Chl-a levels, except for the case of EC parameter that has a reduction of Chl-a levels. This is associated with the fact that the EC is negatively correlated with Chl-a concentrations and the lowest EC values are observed during a HAB . The WT according to the 'Perturb' method is the most influential parameter, a finding verified by a study of .
The application of the 'Weights' method illustrated the relative contribution of each parameter to the simulated Chl-a output (Figure 6.4). The input para-meter Chl-a(t-1) was the most influential. Some other ANN modelling studies of eutrophication e.g.. ; , verify our findings since they found the input parameter Chl-a(t-1) as the most important. The SRP followed second confirming the high correlation of SRP with Chl-a (jeppesen et al., 2006). The WT, SD, DO, EC, pH and DIN were next regarding relative parameter importance.
The results of the 'PaD' method ( Figure 6.5) discriminated the input parameters into two groups. The first group has parameters with high relative contribution. In this group the DO parameter has the higher relative contribution, followed by Chl-a(t-1), SRP, SD and pH parameters. The second group has input parameters with almost zero contribution and is consisted of DIN, WT and EC. This discrimination between minor and major contributing parameters with the use of the 'PaD' method is also observed by Mouton et al. (2008) and was attributed to the high sensitivity of the 'PaD' method.
The results obtained by the sensitivity methods led to useful conclusions, highlighting the need for these results to be incorporated into a unique arithmetical value for each input parameter that will be used for the evaluation of the environmental parameters. An importance index for each input parameter is created, that assigns each parameter an importance order value from the highest to the lowest contributing parameter. For every sensitivity method the importance order index is calculated for each input parameter. The mean importance order value is calculated for the input para-meters based on the results obtained by the three sensitivity methods (Table 6.4). Therefore, the mean importance order index is the combined interpretation of the results given by the three sensitivity methods. By that way the computational differences found for the input parameters when are calculated with a different sensitivity method are balanced. These differences are explained by the different calculation procedures followed by each sensitivity technique ).
The mean importance index found that the Chl-a(t-1) input parameter has the most significant role for the eutrophication process. The DO input parameter was calculated to have the second most important role. This is attributed to the fact that the DO parameter has a strong connection with the input parameter Chl-a(t-1), since the lake is supplied with oxygen by algal photosynthesis (Misra et al., 2011

Conclusions
ANN models can successfully simulate multi-parameter modelling scenarios because of their ability to process non-linear relationships between the dependent and independent variables. In our case the created ANN managed to produce simulated outputs highly correlated with the measured data, making our model a reliable predictor. The application of the three sensitivity techniques allowed us a thorough look into the contributing parameters and their role regarding eutrophication. The introduction of the mean importance order index gave as the advantage to balance the computational borne differences resulted after applying the three different sensitivity methods.
The necessity for a modelling study before any restoration attempt is obvious, because the decision makers can identify the correct restoration technique for the lake. A lake model that combines parameters like planktivorous fish, zooplankton and nutrients is proposed for a future modelling study regarding lake Pamvotis. Unfortunately, no such data set is available in our case, so only the nutrient reduction effect can be examined. The SRP has a leading contributing role regarding eutrophication, in contrast with DIN that was found to have the smallest impact on the model. Therefore, the SRP can be characterized as the key nutrient of the lake and special emphasis must be given on SRP reduction. Nevertheless, the DIN contribution to eutrophication must not be underestimated since based on 'Perturb' method results it is observed that an increase of DIN causes an increase of Chl-a values. A further combined reduction of SRP and DIN is proposed, since the nutrients are associated mainly with anthropogenic activities indicating the need for extra restoration measures for the improvement of lake's trophic status.

Introduction
Freshwater quality has declined in the last decades throughout Europe, due to various environmental issues related to anthropogenic activities. Eutrophication is considered as one of the most important environmental problems that affects freshwater, coastal, and marine ecosystems worldwide (Yang et al., 2008). Freshwater lakes are major providers of water for several purposes, such as water supply, drinking, irrigation, and so forth. Eutrophication has a negative impact on water quality, with ecological and socioeconomic consequences.
Eutrophication triggers various physical and chemical changes in the aquatic environment that may cause the blooming of certain harmful-toxin-producing algae (cyanophyta), which are known to create health issues for organisms living in the lakes as well as humans (Ignatiades and Gotsis-Skretas, 2012). Because of the adverse effects of eutrophication, the necessity for mitigating eutrophic phenomena and recovering water quality has become a priority for environmental scientists (Yang et al., 2008). The effect of eutrophication on public health is so serious that cyanobacteria have been characterized as potential key hazardous pollutants by the European Water Framework Directive (2000) (2000/60/EC) (Hoeger et al., 2004). Water sports, such as swimming, in eutrophic lakes and the consumption of seafood and drinking water contaminated with cyanotoxins are the main reasons for human illness related to cyanotoxins. It is reported that about 60,000 intoxication incidents, with an overall mortality rate of 1.5%, take place per year globally because of algal toxins (Ferrenet et al., 2013).
Artificial neural networks (ANNs) are considered to be a computational modelling tool that is widely used in solving many complex real-world problems (Maier and Dandy, 2001;Odabas, 2014). In recent years, ANNs have become a desirable tool that is applied in many scientific topics (Moustris et al., 2011), such as air pollution, precipitation, water quality, and classification of rainfall prediction (Nastos et al., 2013). The ability of ANNs to learn from the known data without being affected by nonlinearity and their classification potential qualify them as being superior to other traditional statistical tools (Simsek, 2016). Modelling studies that used both ANNs and multiple linear regression (MLR) methods found that ANNs produced better modelling results than MLR in most cases (Odabas, 2014). The main weakness of MLR models as compared with ANN models is that they are based on a linear relationship between input and target variables (Simsek, 2016).
ANNs are divided into two main categories: ANNs with supervised learning and ANNs with unsupervised learning. The most popular supervised ANN is the multilayer perceptron with a backpropagation algorithm, while the most popular unsupervised ANN is the Kohonen selforganizing map (SOM) (Park et al., 2003). According to Peeters et al. (2007), the SOM has recently started to be used in exploratory data analysis, e.g.., Tsai et al. (2017), Ejarque-Gonzalez &Butturini (2014), and it can help with summarizing available data and extracting useful information. The SOM can deal with the phenomenon of nonlinearity, handle noisy data, and be updated easily (Lee and Scholz, 2006a); because of this potential, the SOM algorithm is considered as a powerful tool in exploratory data analysis and the clustering of multivariate data sets (Peeters et al., 2007). The importance of the SOM for water quality management is significant, as the SOM has the potential to analyze multidimensional ecological data and simplify them into visual information that is helpful to understanding the ecological process (Lee and Scholz, 2006b).
Application of the SOM algorithm is widely used in the environmental sciences, and especially in studies examining water quality. For example, Recknagel et al. (2006) used SOMs to evaluate the seasonality effect over two adjacent lakes in response to eutrophication control. A SOM was used by Oh et al. (2007) to cluster the phytoplankton communities from a reservoir. In the study of Cheng et al. (2012), the SOM was used to classify fish communities in shallow lakes, based on several biotic and abiotic factors such as water depth, transparency, and dissolved oxygen. The multi-relationships between fish species and river water quality parameters were examined with the use of a SOM by Tsai et al. (2017). In their study, Park et al. (2003) used a SOM to classify 23 different water types, such as streams, lakes, rivers, canals, and ponds. The SOM classified the sampling sites into five clusters based on environmental parameters, and related them with species richness. In another modelling study, Tota-Maharaj and Scholz (2013) applied the SOM to simulate microbial data from the effluent of pavement systems used to treat stormwater runoff, with a good accuracy, having a minimum correlation of R=0.751 between the real and predicted data, and using as the model's inputs parameters that are not expensive to measure; whereas the measurement of microbial pathogens concentrations is a time-consuming and expensive procedure. These examples demonstrate the applicability of the SOM algorithm not only in ecological research, but also in water management studies.
The objective of this study was twofold. First, we assessed the interactions of the environmental parameters related to the algal productivity of two transboundary Greek Lakes-Megali Prespa (or Great Prespa) and Mikri Prespa (or Small Prespa)-by applying a combination of a PCA, a cluster analysis, and a SOM algorithm. Second, we compared the results among the different techniques to evaluate the effectiveness of SOMs in relation to more "traditional" tools such as PCA and cluster analysis. The importance of this work also lies within the fact that there is no other published work using a SOM to study the water quality parameters related to the eutrophication of Greek lakes, to the knowledge of the authors. Therefore, this study could set the basis for the modelling with SOMs of additional Greek lakes that are impacted by the effects of eutrophication. Additionally, the results of this study can be useful for exploring the mechanisms associated with algal productivity in the studied area.

Study Area and Data Collection
The transboundary Prespa area is a geographically remote area located in northwestern Greece. It is an area of great ecological importance, and has been declared a national park (in 1974), a Ramsar wetland of international importance (in 1987), an important bird area (in 1983), and a Natura 2000 site (Pyrovetsi, 1989). Lakes Megali Prespa, at 849m above sea level (asl) , and Mikri Prespa, at 853.5m asl (Koussouris et al., 1989), are surrounded by mountains and located in the transboundary Prespa area shared by Greece, Albania, and the Former Yugoslavian Republic of Macedonia (FYROM) (Figure 7.1).
The climate of the area is characterized as sub-Mediterranean with continental influences, with frequent snowfall in the winter and summer rain drops .The transboundary catchment's area covers about 1300 km 2 , and the permanent population of the Greek part is estimated at about 1500 residents (Pyrovetsi, 1989).
Megali Prespa is a large, deep lake, covering 254 km 2 ,with a 14m mean and 48m maximum water depth (Cvetkoska et al., 2016;). Mikri Prespa is a shallow lake, with approximately 48 km 2 of surface area, and a maximum water depth of 8 m (Loffler et al., 1998;Albrecht et al., 2012;Stefanidis and Papastergiadou, 2010) and mean depth of 4.1 m (Koussouris et al., 1989;Albrecht et al., 2012;Stefanidis and Papastergiadou, 2010). The two lakes formed a single lake in the past, but nowadays are distinct and connected through an artificial channel . Lake Mikri Prespa is supplied with water only seasonally through surface runoff, mainly from small rivers (Koussouris et al., 1989), and overflows into Lake Megali Prespa (Cvetkoska et al., 2016;Leng et al., 2013). The inflow from Lake Mikri Prespa is about 9%, while the rest of the water inflow into Megali Prespa is from small streams (56%) and direct precipitation (35%) (Aufgebauer et al., 2012). Lake Megali Prespa has no surface outflow, but is connected through karstic channels to the neighbouring Lake Ohrid .
The trophic status of Lake Mikri Prespa is characterized as eutrophic, and prolonged cyanobacterial blooms occur, which may start in spring and persist until December, favoured by the warm climate (Vardaka et al., 2005). Lake Megali Prespa is considered mesotrophic Albrecht et al., 2012;Leng et al., 2013), with summer bottom anoxia and an average total phosphorus concentration of 31 mg·m −3 (Leng et al., 2013). Water samples were collected from a total of fifteen sampling sites in Lake Mikri Prespa and four sites in Lake Megali Prespa (Figure 7.1). Samplings were carried out on a seasonal basis during spring, summer, and autumn, for the monitoring period of -2008(Stefanidis and Papastergiadou, 2010). The measured environmental parameters were: pH, surface dissolved oxygen (DO),electrical conductivity (EC), Secchi disk (SD) depth, water depth, surface water temperature (WT),total phosphorus (TP),dissolved inorganic nitrogen (DIN),and chlorophyll-a (Chl-a).
The parameters were measured at several sampling sites within the Greek part of the lakes for each monitoring season. Most of the samples from Lake Megali Prespa were collected from the littoral zone of the lake near the Psarades village, located south of Lake Megali Prespa, and no samples from the pelagic zone were included. The statistical description of the data is given in Table 7.1. More details regarding the monitoring process and the measurement of environmental parameters are given by Stefanidis & Papastergiadou (2010).

Statistical Methods and Theoretical Background
Principal component analysis (PCA) is a multivariate data analysis method used to reveal patterns in large data sets . PCA is a mathematical dimension reduction procedure, used to reduce the number of variables of a data set to a smaller number of variables, without information loss of the initial data set (Birks, 2012). PCA transforms the data into a new set of variables (or coordinates), called principal components (PCs) .
The use of principal components instead of the initial variables is a more reliable way to represent relationships, because the system's noise is reduced (Mueller and Grunsky, 2016). In this modelling study, the environmental parameters were normalized prior to any calculations, and all the samples that contained missing values were removed from the data set. The parameters were standardized by subtracting the sample mean from each observation and then dividing by the sample standard deviation. The simulation results for the statistical methods were carried out using the MatLab software. Only the PCs that explain at least 10% of the variance are considered for this study, as the other PCs contributed very little.
Cluster analysis is a technique that classifies objects into different groups based on their characteristics . The aim of cluster analysis is to find groups (clusters) with homogeneous properties out of heterogeneous large samples, with high internal homogeneity within clusters and high heterogeneity between clusters . High levels of similarity among objects are indicated by a small value in a distance matrix and large values in proximity or similarity matrices (Gore, 2000). Cluster methods are divided into four categories: hierarchical methods, partitioning methods, overlapping cluster procedures, and ordination techniques .
Euclidean distance and Ward agglomerative methods were used for the cluster analysis. The clusters number can be calculated with the use of a cutoff line for the dendrogram, based on the Sneath's index of cluster significance (Sneath and Sokal, 1973). The less restrictive significance criterion of Sneath's index (2/3 of Dmax) and the strict significance criterion of Sneath's index (1/3 of Dmax) may be used, where Dmax is the maximum of the distance measure D.

Self Organizing Map Theory
The Kohonen SOM is an unsupervised ANN. The SOM has the ability to learn without being given the associated output values for the corresponding input data, and the desired output is not known a priori. The goal of the learning process is to classify the input data according to their similarity (Peeters et al., 2007). SOMs are a very practical tool for data visualization; also, SOMs can be used for prediction and correlation analysis, mostly with visual representation (Rub et al, 2009). One of the various applications of the SOM algorithm is the finding of statistically significant dependencies among the variables in a multidimensional data sample, where two highly correlated variables produce two similar component planes ( Barreto-Sanz and Perez-Uribe, 2007). A SOM projects high-dimensional data into a low-dimension space (Kohonen, 2001). This is usually a two-dimensional space, so the neurons are arranged in two dimensions (see Figure 7.2), because the visual summary of the output is more understandable (Aguilera et al., 2001). The sample data can be clustered either as manually determined by a U-matrix or can be automated by a clustering algorithm implemented in the SOM, and usually the hierarchical clustering algorithm is applied . with the use of a SOM; where x1, x2,…,xn are the input variables, n is the input variable's number, and wij is the synaptic weight that is connecting the i input variable with the j node.
The SOM is consisted of an input layer and an output layer that are connected with computational weights (Park et al., 2006;An et al., 2016). The output layer consists of neurons that are arranged in a hexagonal or rectangular grid and are fully interconnected (Peeters et al., 2007). The input patterns, usually after normalization, are imported through the neurons in the input layer. The SOM algorithm can be summarized into the following steps : 1. Weight vector initialization with random values.
2. Use of a distance measure, usually the Euclidean distance, to find the best-matching unit (BMU).
3. Movement closer to the input vector by updating the weight vector of the BMU and the neighboring neurons.
The Euclidean distance (Di) mentioned above is described by the following equation, and calculates the distance measure between the input vector and the i weight vector Zhang et al., 2008): = ∑ ( − ) ; = 1,2, … .
(Eq. 7.1) where S is the number of output neurons,R is the dimension of the input vectors, pij represents the j element of the input vector, and wij symbolizes the j element of the i weight vector. The term BMU is defined, according to Lee & Scholz (2006b), as the neuron with the weight vector closest to the input variable x, as given by the equation:  A descriptive table for the first four PCs and their characteristics is created (Table 7.2). PC1 explains 32.68% of the total variance, and has strong negative loadings for SD and depth, and strong positive loadings for EC and phytoplankton Chl-a. PC1 seems to describe the relationships between SD, depth, EC and Chl-a. PC2 explains 17.29% of the total variance, and has strong positive loadings for pH, WT, and Chl-a. PC2 describes the relationship between the Chl-a and WT parameters. PC3 explains 13.08% of the total variance, and has a strong positive loading for DO and a strong negative loading for TP. PC4 explains 11.12% of the total variance, and has a strong negative loading for DIN. The bivariate plots for the PCs (Figure 7.4) reveal differences among the samples from Lake Megali Prespa, symbolized as L1 (n1=26), and the samples from Lake Mikri Prespa, symbolized as L2 (n2=89). The bivariate plots of PC1 versus the rest of the PCs show a clear separation between the samples from the two lakes. The cluster analysis grouped the samples with respect to the water quality parameters. The cluster analysis divided the data into three clusters, where Ward's linkage method and Euclidean distance measurements were used. The dendrogram constructed by the means of cluster analysis, shown in Figure 7.5, illustrates the arrangement of the clusters.

SOM Algorithm Results
A SOM with 10 × 5 neurons was created, based on the rule given by Equation (7.3). The components planes (CPs) are visualized in Figure 7.6. For the clustering of the SOM prototypes, the K-means algorithm method and the hierarchical algorithm method were used and evaluated in order to find the most appropriate clustering method for this modelling study. The optimal number of clusters minimizes the Davies-Bouldin index when the SOM is implemented by the K-means algorithm Wang et al., 2014). In our case, the optimal cluster number was five (Figure 7.7). The L1 samples (Megali Prespa) correspond to cluster 1 and cluster 2, while the L2 samples (Mikri Prespa) correspond to cluster 3, cluster 4, and cluster 5. The SOM gave a clear classification of the samples from the two lakes. Prespa, the red represents data from Lake Megali Prespa, and the empty nodes are associated with the absence of data samples.
The hits histogram distinguished the Megali Prespa data from the Mikri Prespa data. The difference between the data from Megali Prespa and data from Mikri Prespa is primarily associated with the water transparency (or SD). It is also reversely associated with the EC.
The hierarchical cluster analysis performed on the SOM prototypes calculated three district groups for the investigated lakes. Ward's linkage method and Euclidean distance measurement were used, and a dendrogram was derived (Figure 7.8). The existence of three clusters was computed with the use of the less restrictive significance of Sneath's index (2/3 of Dmax). Based on the dendrogram, the L1 samples (Megali Prespa) are located on the right side (branch) and belong to cluster 3, while the L2 (Mikri Prespa) samples are located on the left side (branch) and belong to cluster 1and cluster 2. As expected, the sampling sites from the shallower Lake Mikri Prespa were associated with higher Chl-a values, as shallow lakes are more prone to nutrient-mixing eutrophication that might benefit algal production . The TP values also increased as water depth decreased. It is documented that strong positive relationships exist between TP loadings and algal biomass . No clear conclusions are extracted for the DO parameter. The pH parameter seems to have a negative relationship with the Chl-a, and generally the pH parameter is associated with eutrophication . Besides the visualization of the results with the CPs, the SOM clusters can also provide useful information regarding the water quality parameter interactions.
For the clustering of the SOM's prototypes, two different clustering methods were used. The K-means algorithm method (see Table 7.3) and the hierarchical algorithm method (see Table 7.4) were applied. The mean values and standard deviation (SD) for each SOM cluster were calculated and are presented in Tables 3 and 4. The K-means algorithm is considered a most suitable clustering method for the result implementation of this modelling study, as it calculated five clusters instead of the three that the hierarchical algorithm found. The existence of five clusters instead of three allows the more detailed examination of the environmental parameter interactions. For example, regarding the Lake Megali Prespa-associated clusters, the role of WT and therefore the seasonality effect is easily observed in Table 7.3, but no clear conclusions can be derived from Table 7.4.

Discussion
Unraveling and investigating trophic function mechanisms is a task of major importance, particularly for developing sustainable management and restoration plans for lakes with poor water quality. There is a large number of monitoring studies that have pointed out that nutrient enrichment has been the main cause of eutrophication, for example ; Kagalou et al. (2008). PCA and cluster analysis are statistical methods commonly used in such studies that often provide satisfactory results and useful conclusions about the main parameters associated with eutrophication processes.
Conversely, SOMs models are considered to be more advanced modelling tools used in water quality modelling studies (Wang et al., 2014). Modelling studies examining lake water quality by combining a SOM model and statistical methodologies are approaching more comprehensively the behaviour of the limnological parameters. Wang et al. (2014) and An et al. (2016) displayed valuable results with statistical methodologies, however the findings of the SOM model were much more practical and specific. The prevalence of the SOM method against PCA is related to the fact that the relationships among environmental variables are nonlinear, while the PCA is based on linear principles ). An important advantage of the SOM compared with the PCA method is based on its visualization abilities, provided by the SOM's component planes. With the use of the component planes, the distribution of the component values is represented and direct visual examination is provided in order to allow correlations between several component planes that can be investigated simultaneously ).
Another advantage of the SOM method compared with the classic statistical techniques is related to the dimension reduction provided by the SOM by projecting multidimensional data into a two-dimensional space. The dendrogram created by applying the hierarchical clustering method on the standardized raw data is usually overcrowded, making its implementation difficult. In contrast, the dendrogram resulting from the SOM prototypes is easily implemented. This is because the SOM neural network abridges the samples of the initial data set to a smaller set of prototypes, providing the possibility to construct a simplified dendrogram .
Representation by the CPs can be considered as an efficient and practical visualization tool for extracting useful conclusions regarding the trophic function of the two lakes. The L1 (Megali Prespa) data are correlated with high SD and low EC values, thus providing a clear separation from L2 (Mikri Prespa) data. The parameter SD seems to have a crucial role for clustering the data of the studied lakes. Meanwhile, the SD parameter has a negative correlation with the EC, since shallower lakes are often associated with higher EC. The high WT are linked with high Chl-a values, mainly for samples from Lake Mikri Prespa. Despite the rather small number of samples collected from the littoral zone of the Greek part of Lake Megali Prespa, the SOM managed to separate the data from the two lakes, and produced results comparable to those of the other statistical tools. The environmental parameters associated with algal production were visualized by the SOM component planes. Through the CPs, the visualization of all variables enables them to be examined simultaneously, and direct associations between the variables for specific value ranges of each examined variable can be obtained. In addition, PCA and cluster analysis do not always capture the hidden information that is provided by the data set. In contrast, the SOM can detect hidden patterns and recognize specific features of the data set which are often different in the short-term assessment . The SOM revealed hidden patterns for each lake after careful examination of the CPs (Figure 7.6). Our results showed that the modelled environmental parameters, which are associated with algal production, followed a pattern influenced by the seasonality effect for each lake data set. Specifically, the Kmeans algorithm clustering results for Lake Mikri Prespa showed that the data of cluster 5 had higher values for the Chl-a and WT parameters than the other two clusters, suggesting the seasonality effect over Lake Mikri Prespa and the elevated algal production during hot months.
The data that are associated with cluster 3 and cluster 4 had moderate values for Chl-a and WT, suggesting such conditions of these parameters during spring and autumn. However, no clear patterning is observed for the rest of the parameters. A deeper examination of the revealed patterns shows that for Lake Mikri Prespa, Chl-a presents its maximum value when the WT is also maximum; while at the same time, the TP presents its minimum and the DIN is relatively very low. Regarding Lake Megali Prespa, the minimum Chl-a value is associated with the maximum SD value, elevated WT value, and relative low TP and DIN values. In contrast, the maximum Chl-a value is associated with the maximums of the pH and WT parameter values, elevated DIN, and a very low TP value. These two clusters suggest an almost reversed patterning between them, where Lake Megali Prespa is affected by the seasonality to a lesser extent than the shallow Lake Mikri Prespa.
Additionally, the SOM more reliably visualizes nonlinear and heterogeneous data than the PCA method into a two-dimensional space (Brosse et al., 2001). The above observations highlight the SOM superiority compared with the PCA, since the SOM captures well the complex nonlinear mechanisms associated with algal production. However, the PCA did not manage to relate the parameter interactions well enough, limited by its linear nature as a method.
For example, the nonlinear relationship between the Chl-a and TP parameters is well presented by the SOM through its CPs as was discussed for each lake data set, and specific information regarding the parameter interactions can be extracted based on the mapping of the data values. In contrast, the PCA (see Figure 7.4) provides only generalized information for a tendency regarding these parameter interactions. Because of this, the SOM is considered to be an innovating method compared with PCA and cluster analysis. Based on the SOM results, high nutrient concentrations were associated with elevated phytoplankton Chl-a values. High water temperature (WT) values were also correlated with the high Chl-a values. The role of water temperature as a key environmental parameter for controlling phytoplankton biomass in Lake Mikri Prespa was highlighted in a study by Tryfon & Moustaka-Gouni (1997), where it was shown that an increase in biomass of cyanophytes occurred at temperatures greater than 16°C.
The SD depth was also found to be a critical parameter based on PCA, cluster analysis, and SOM. Both the cluster analysis method and the SOM method managed to distinguish between the samples from Lakes Mikri and Megali Prespa, mainly because of the SD depth variation.
However, the SOM with the use of K-means algorithm distinguished the water quality parameters into five diverse clusters, providing a deeper examination of the parameter interactions, in contrast with the cluster analysis method, which gave only three clusters. The results presented in Table 7.3 showed that cluster 2 samples are characterized by lower Chl-a and lower WT values than cluster 1, reflecting the seasonal variations of algal productivity of Lake Megali Prespa. Additionally, cluster 3 and cluster 4, which are associated with Mikri Prespa, show noticeably different mean WT values compared with cluster 5. Cluster 5, in particular, seems to be associated with the hot months and the highest temperatures, where Chl-a has the highest mean value. On the other hand, cluster 3 and cluster 4 have lower WT and Chl-a mean values, probably representing conditions typical of the spring and autumn seasons.
The depth parameter, which is strongly correlated with SD depth, was not included as an input variable for the SOM model, because the aim of this modelling study was to focus on the mechanisms that are affecting algal production, and not the morphological characteristics of the lakes. In order to avoid bias in the results, it was decided that the depth parameter would not be included as an input parameter.
It is documented that low water transparency (SD depth) is associated with increased algal production (Jeppesen et al., 2007;Scavia et al., 2014), and Chl-a has an analogous relationship with the SD depth (Hadjisolomou et al., 2016). In a study by Zacharias et al. (2002) that examined the biological and chemical characteristics of the major Greek lakes, it was stated that the SD depth is high in deep lakes and low in shallow lakes. The SOM model that managed to separate the two lakes matched the high SD depth with samples from Lake Megali Prespa and low SD depth values with samples from Lake MikriPrespa. As expected, the observation of the CPs associated the data samples of Lake Mikri Prespa with low SD depth values and increased Chl-a levels, and the data samples of Lake Megali Prespa with high SD depth values and low Chl-a levels.
The role of the SD depth has been given great attention in many studies, in order to discriminate possible environmental factors influencing the water quality between shallow and deep lakes .For example, in a study by Stefanidis et al. (2016), two geomorphologically different lakes were examined, and it was found that the SD depth was a good indicator of the water quality deterioration and enhancement of eutrophication. The above monitoring examples, combined with the fact that the nutrient loading threshold for obtaining clear water conditions differs among climate zones (Jeppesen et al., 2017), lead to the conclusion that modelling limnological data sets is a very case-sensitive task; while the mechanisms controlling limnological parameters are complex and their interactions are hard to examine. Therefore, the application of a SOM is ideal in studying water quality parameters and their interactions in lakes.
The increasing eutrophication that has been reported for the studied lakes (Albrecht et al., 2012) is mainly related to human activities. The last decades' increased land use activities in the catchment's area, such as agriculture, the encroachment of wetland areas, and the expansion of the irrigation system, have had a negative impact on the wetland biodiversity, enhancing the eutrophication processes (Pyrovetsi, 1989). This water quality deterioration was demonstrated by Loffler et al. (1998), where a remarkable decrease of Secchi depth in Lake Megali Prespa, from 7.2 to 10m in the 1950s to 3.2m in the 1990s, was noted. The SOM managed to simulate the algal production process for the two lakes and identify the environmental parameter interactions.
Therefore, SOM algorithms can be used as a tool that can aid management authorities in developing a guideline regarding the restoration measures that should be applied. Besides the water transparency that was one of the key factors associated with water quality, the water temperature was significantly correlated with phytoplankton Chl-a. Based on this finding, it would be interesting to further explore the effects of the anticipated rising of temperature, along with intensification of water level changes on the water quality and the processes of eutrophication.
In our case, the SOM separated the two sampling areas and verified the good water quality status of Lake Megali Prespa. Regarding the shallow Lake Mikri Prespa, which is suffering the effects of eutrophication, the role of water transparency and water temperature should be considered when prioritizing management measures. Proposed management options should mitigate the excessive use of fertilizers, but should also consider other pressures, such as water abstraction, in order to reduce nutrient loadings and provide a relatively stable water level regime.

Conclusions
This modelling study presented different methodologies in order to examine the water quality of the investigated lakes. The results derived from the classic statistical methodologies were compared with the SOM neural network findings. The results from PCA, cluster analysis, and the SOM method agreed about the trophic function properties of the lakes, but the SOM results were more specific, and allowed direct associations between the water quality variables.
The SOM neural network has a basic advantage derived from its visualization abilities. Based on our findings, the SOM neural network can be described as an innovating modelling tool that can be used autonomously or in parallel with the rest of the traditional modelling methods for successfully evaluating freshwater quality.