Optimized Heuristic Implementations in AI Multi‐Agent Systems

Κεντρικός στόχος της συνδυαστικής βελτιστοποίησης είναι να παραδώσει μια σειρά από ανεξάρτητα αποτελέσματα της επιχειρησιακής έρευνας με την εφαρμογή αλγορίθμων και της θεωρίας υπερ-γράφηματων σε μια σειρά τυχαίων θεωρητικά προβλημάτων της επιστήμης των υπολογιστών. Ένα βασικό κίνητρο είναι ότι χιλιάδες προβλήματα της πραγματικής ζωής μπορεί να διατυπωθούν ως αφηρημένα προβλήματα συνδυαστικής βελτιστοποίησης. Αν και, τα συστήματα πολλαπλών πρακτόρων μπορεί επίσης να χρησιμοποιηθεί για την επίλυση προβλημάτων που είναι δύσκολο ή αδύνατο για τυπικά συστήματα. Στην πράξη εκτεταμένη έρευνα απαιτείται για την επίλυση συνδυαστικών προβλημάτων βελτιστοποίησης με τη χρήση πολυπρακτορικών συστημάτων.Η παρούσα διατριβή ασχολείται με μεθευρετικές προσεγγίσεις των αλγορίθμων τεχνητής νοημοσύνης και συγκεκριμένα και ειδικότερα του "Βελτιστοποίηση σμήνους σωματιδίων" και του "Βελτιστοποίηση αποικίας μυρμηγκιών"Συγκεκριμένα η παρούσα διατριβή εστίασε στα παρακάτω θέματα1.συνδυαστική βελτιστοποίηση των ευφυών συστημάτων πολλαπλών πρακτόρων2.Δυνατότητες των σύγχρονων ευφυών πολυπρακτορικών συστημάτων και βιωσιμότητα αυτών Στα παραπάνω πλαίσια για να ερευνήσουμε τη συμπεριφορά αυτών των αλγορίθμων ασχοληθήκαμε με την επίλυση των παρακάτω τριών προβλημάτων βελτιστοποίησεις1. Το πρόβλημα της δρομολόγησης ενός πλοίου με τη χρήση του αλγορίθμου "Βελτιστοποίηση Σμήνους Σωματιδίων"2. Στην αποτίμηση της επίδρασης του ορισμού αρχικής ποσότητας φερορμόνης στον αλγόριθμο "Βελτιστοποίηση αποικίας μυρμηγκιών" για την επίλυση προβλημάτων δρομολόγησης3. Μια συγκριτική μελέτη των μεθευρετικών αλγορίθμων για το σχεδιασμό ενός "Στόχου από Βελάκια (Dartboard)"

[xii] However, the Ant Colony Optimization technique has emerged recently as a new metaheuristic for hard combinatorial Optimization Problems. Implementing a randomized construction heuristic extension of Ant Colony System algorithm in the Vehicle Scheduling Problem has become the second goal of this dissertation which is met by the following:

. VIII Introduction
Artificial intelligence (aka AI), is the field that studies and research the analysis of computational agents 1 acting intelligently (what an agent does) within an environment [1].

More specifically, an agent acts intelligently if:
 what it does is appropriate for its given goals;  it is flexible to modify environments and goals;  it learns from experience gained;  it makes optimum choices given its perceptual and computational limitations.
In addition, a computational agent is an agent whose decisions about its actions can be explained in terms of computation. An agent typically cannot observe the state of the world directly; it has only a finite memory and it does not have unlimited time to act. Thus, a decision can be segmented into primitive/primary operation although there are some agents that are arguably not computational (i.e. wind, rain eroding a landscape). The main scientific/research goal of AI is to comprehend in-depth the principles that make intelligent behavior possible in natural or artificial systems by:  the analysis of natural and artificial agents;  formulate and test hypotheses schemes of constructing intelligent agents;  designing, building, and experimenting with environments, agents and tasks.
As part of science, most researchers build empirical systems to test their hypotheses or to explore the remaining space of possibilities. Therefore the engineering goal of AI is to design artifacts; agents that act intelligently and become useful in many application fields and research domains. 1 Agents include worms, dogs, thermostats, airplanes, robots, humans, companies, and countries. AI [3] Nevertheless, it is arguable that in AI you cannot have fake intelligence as it is only the external behavior that defines intelligence. Therefore if and when AI is achieved the real intelligence created is artificially. This idea of AI defined by external behavior was the motivation for a test designed by Turing (1950) (aka Turing test) [2]. The Turing test consists of an imitation game where an interrogator can ask a witness, via a text interface, any question [3]. If the interrogator cannot distinguish the witness from a human, the witness must be intelligent. Table 1 shows the example of a possible Turing Test. An agent that is not really intelligent could not fake intelligence for arbitrary topics In the first line of your sonnet which reads "Shall I compare thee to a summer's day," would not "a spring day" do as well or better?
Witness It wouldn't scan.
Interrogator How about "a winter's day," That would scan all right.

Witness
Yes, but nobody wants to be compared to a winter's day.

Interrogator
Would you say Mr. Pickwick reminded you of Christmas?
Witness In a way.
Interrogator Yet Christmas is a winter's day, and I do not think Mr. Pickwick would mind the comparison.
Witness I don't think you're serious. By a winter's day one means a typical winter's day, rather than a special one like Christmas.
The obvious naturally intelligent agent is the human being. One class of intelligent agents that may be more intelligent than humans is the class of organizations. Ant colonies are a prototypical example of organizations. Each individual ant may not be very intelligent, but an ant colony can act more intelligently than any individual ant. The colony can discover food and exploit it very effectively as well as adapt to changing circumstances. Similarly, companies can develop, manufacture, and distribute products where the sum of the skills required is much more than any individual could master. Modern computers, from low-level hardware to high-level software, are more complicated than any human can understand, yet they are manufactured daily by organizations of humans [4]. Human society viewed as an agent is arguably the most intelligent agent known and it is instructive to consider where human intelligence comes from as there are three (3) main sources:  Biology: Humans have evolved into adaptable animals that can survive in various habitats.
 Culture: Culture provides not only language, but also useful tools, useful concepts, and the wisdom that is passed from parents and teachers to children.
 Life-long learning: Humans learn throughout their life and accumulate knowledge and skills.

[4]
These sources interact in complex ways [5]. Biological evolution has provided stages of growth that allow for different learning at different stages of life. Culture interacts strongly with learning. A major part of lifelong learning is what people are taught by parents and teachers. Language, which is part of culture, provides distinctions in the world that should be noticed for learning.

Defining Agents and Environments
An agent is something that acts in an environment [5] [8]. Purposive agents have preferences.
They prefer some states of the world to other states, and they act to try to achieve the states they prefer most. The non-purposive agents are grouped together and called nature. AI is about practical reasoning: a coupling of perception, reasoning, and acting comprise an agent acting in an environment [6] [7]. An agent's environment may well include other agents. An agent together with its environment is called a world. Thus, an agent could be, for example, a coupling of a computational engine with physical sensors and actuators, called a robot, where the environment is a physical setting. It could be the coupling of an advice with a human who provides perceptual information and carries out the task. An agent could be a program that acts in a purely computational environment (i.e. a software agent).   goals that it must try to achieve or preferences over states of the world;  abilities, which are the primitive actions it is capable of carrying out. Two deterministic agents with the same prior knowledge, history, abilities, and goals should do the same thing. Changing any one of these can result in different actions. Each agent has some internal state that can encode beliefs about its environment and itself. It may have goals to achieve, ways to act in the environment to achieve those goals, and various means to modify its beliefs by reasoning, perception, and learning. There are a number of ways an agent's controller can be used:  An embedded agent is one that is run in the real world, where the actions are carried out in a real domain and where the sensing comes from a domain.

[6]
 A simulated agent is one that is run with a simulated body and environment; that is, where a program takes in the commands and returns appropriate percepts. This is often used to debug a controller before it is deployed.
 A agent system model is where there are models of the controller (which may or may not be the actual code), the body, and the environment that can answer questions about how the agent will behave. Such a model can be used to prove properties of agents before they are built, or it can be used to answer hypothetical questions about an agent that may be difficult or dangerous to answer with the real agent.
Each of these is appropriate for different purposes.
 Embedded mode is how the agent must run to be useful.
 A simulated agent is useful to test and debug the controller when many design options must be explored and building the body is expensive or when the environment is dangerous or inaccessible. It also allows us to test the agent under unusual combinations of conditions that may be difficult to arrange in the actual world.
 How good the simulation is depends on how good the model of the environment is.
Models always have to abstract some aspect of the world. Appropriate abstraction is important for simulations to be able to tell us whether the agent will work in a real environment.
 A model of the agent, a model of the set of possible environments, and a specification of correct behavior allow us to prove theorems about how the agent will work in such environments. For example, we may want to prove that a robot running a particular controller will always get within a certain distance of the target, that it will never get stuck in mazes, or that it will never crash. Of course, whether what is proved turns out to be true depends on how accurate the models are.
 Given a model of the agent and the environment, some aspects of the agent can be left unspecified and can be adjusted to produce the desired or optimal behavior. This is the general idea behind optimization and planning.

Knowledge Representation and Hierarchical Control
Typically, a problem to solve or a task to carry out is what it constitutes a solution and it is not simple [9][10]. The general framework for solving problems by computer is given in Figure 2 [11]. To solve a problem, the designer of a system must:  determine what constitutes a solution;  represent the problem in a language with which a computer can reason;  compute an output and interpret the output as a solution to the problem.

[7]
Knowledge is the information about a domain that is used as part of designing a program to solve the problem of the domain of interest. A representation scheme is the form of how the knowledge is represented, used in an agent while specifying the form of the knowledge base; the representation of all of the knowledge that is stored by an agent [12]. A good representation scheme should be:  rich enough to express the knowledge needed to solve the problem;  as close to the problem as possible; it should be compact, natural, and maintainable.
 amenable to efficient computation;  able to be acquired from people, data and past experiences.  An agent can have multiple, even contradictory, models of the world. The models are judged not by whether they are correct, but by whether they are useful [16] [17].
Choosing an appropriate level of abstraction is difficult [18][19] because:  a high-level description is easier for a human to specify and understand;  a low-level description can be more accurate and more predictive;  the lower the level, the more difficult it is to reason with;  you may not know the information needed for a low-level description. 3. Approximately optimal solution: One of the advantages of a cardinal measure of success is that it allows for approximations. An approximately optimal solution is one whose measure of quality is close to the best that could theoretically be obtained. Typically agents do not need optimal solutions to problems; they only must get close enough. For some problems, it is much easier computationally to get an approximately optimal [9] solution than to get an optimal solution. However, for other problems, it is (asymptotically) just as difficult to guarantee finding an approximately optimal solution as it is to guarantee finding an optimal solution. Some approximation algorithms guarantee that a solution is within some range of optimal, but for some algorithms no guarantees are available.

Probable solution:
A probable solution is one that, even though it may not actually be a solution to the problem, is likely to be a solution. This is one way to approximate, in a precise manner, a satisficing solution. Often you want to distinguish the false-positive error rate (the proportion of the answers given by the computer that are not correct) from the false-negative error rate (which is the proportion of those answers not given by the computer that are indeed correct). Some applications are much more tolerant of one of these errors than of the other.

Reasoning (symbols), Perception and Acting
As already mentioned an agent has belief states that are maintained through time. The belief state can be complex even for a single layer, thus, experience in studying and building intelligent agents requires some internal representation of its belief state. Knowledge is the information about a domain and includes general information (knowledge) that can be applied to particular situations [22] [23]. In AI systems, knowledge is typically but not necessarily true however this distinction often becomes blurry when a module of agent treat information as true while another module of an agent may be able to revise that information.
A knowledge base is built offline and is used online to produce actions. This decomposition of an agent ( Figure 3) is orthogonal to the layered view of an agent; an intelligent agent requires both hierarchical organization and knowledge bases. The knowledge base is its long-term memory, where it keeps the knowledge that is needed to act in the future. This knowledge comes from prior knowledge and is combined with what is learned from data and past experiences.
The belief state is the short-term memory of the agent, which maintains the model of current environment needed between time steps. Then, there is feedback from the inference engine to the knowledge base, because observing and acting in the world provide more data from which to learn. The goals and abilities are given offline, online, or both, depending on the agent. The online computation can be made more efficient if the knowledge base is tuned for the particular goals and abilities. However, this is often not possible when the goals and abilities are only available at runtime. Figure 4 shows more detail of the interface between the agents and the world. The manipulation of symbols to produce action is called reasoning. One way that AI representations differ from computer programs in traditional languages is that an AI representation typically specifies what needs to be computed, not how it is to be computed.
Much AI reasoning involves searching through the space of possibilities to determine how to complete a task [24]. In deciding what an agent will do, there are three aspects of computation that must be distinguished: [11]  Design time reasoning is the reasoning that is carried out to design the agent. It is carried out by the designer of the agent, not the agent itself.
 Offline computation is the computation done by the agent before it has to act. It can include compilation and learning. Offline, the agent takes background knowledge and data and compiles them into a usable form called a knowledge base. Background knowledge can be given either at design time or offline.
 Online computation is the computation done by the agent between observing the environment and acting in the environment. A piece of information obtained online is called an observation. An agent typically must use both its knowledge base and its observations to determine what to do.
It is important to distinguish between the knowledge in the mind of the designer and the knowledge in the mind of the agent. Two broad strategies have been pursued in building agents:  The first is to simplify environments and build complex reasoning systems for these simple environments. This is also important for building practical systems because many environments can be engineered to make them simpler for agents.
 The second strategy is to build simple agents in natural environments. This is inspired by seeing how insects (ants) can survive in complex environments even though they have very limited reasoning abilities.
One of the advantages of simplifying environments is that it may enable us to prove properties of agents or to optimize agents for particular situations. Proving properties or optimization typically requires a model of the agent and its environment [25].

Dimensions of Acting Complexity
Agents acting in environments range in complexity with multiple goals acting in competitive environments. A number of dimensions of complexity exist in the design of intelligent agents.
These dimensions may be considered separately but must be combined to build an intelligent agent and define a design space of AI [26][27].
1. Modularity is the extent to which a system can be decomposed into interacting modules that can be understood separately. Modularity is important for reducing complexity. It is apparent in the structure of the brain, serves as a foundation of computer science, and is an important part of any large organization. In the modularity dimension, an agent's structure is one of the following:  flat: there is no organizational structure; [12]  modular: the system is decomposed into interacting modules that can be understood on their own;  an indefinite horizon planner is an agent that looks ahead some finite, but not predetermined, number of steps ahead;  an infinite horizon planner is an agent that plans on going on forever. This is often called a process.  The learning dimension determines whether knowledge is given or knowledge is learned (from data or past experience). Learning typically means finding the best model that fits the data. Sometimes this is as simple as tuning a fixed set of parameters, but it can also mean choosing the best representation out of a class of representations.
Learning is a huge field in itself but does not stand in isolation from the rest of AI.
7. Sometimes an agent can decide on its best action quickly enough for it to act.
The computational limits dimension determines whether an agent has:  perfect rationality, where an agent reasons about the best action without taking into account its limited computational resources;  bounded rationality, where an agent decides on the best action that it can find given its computational limitations. [14]

Conclusions
AI is a very young discipline. Other disciplines as diverse as philosophy, neurobiology, evolutionary biology, psychology, economics, political science, sociology, anthropology, control engineering, and many more have been studying intelligence much longer. The science of AI could be described as synthetic-psychology, experimental-philosophy, or computationalepistemology 2 . AI can be seen as a way to study the old problem of the nature of knowledge and intelligence, but with a more powerful experimental tool than was previously available. [16]

Multi-Agent Systems, Taxonomy and Architectures
XTENDING the realm of the social modern world to include autonomous systems has always been of paramount importance. However, examining the implications of multiple autonomous "agents" interacting in real world scenarios/schemes, what should an agent do when there are other agents, with their own values, who are also reasoning about what to do? We consider the problems given a mechanism that specifies how the world works, and of designing a mechanism that has useful properties.

Introduction
Multiagent Systems (MAS) is the emerging subfield of AI that aims to provide both principles Like any useful approach, there are some situations for which it is particularly appropriate, and others for which it is not. Thus, it would not be right to claim that MAS should be used when designing all complex systems. The goal of this section is to underscore the need for MAS while giving characteristics of typical domains that can benefit from it [3]. [17]

Multi-Agent Systems (MAS):
The most important reason to use MAS when designing a system is that some domains require it. In particular, if there are different people or organizations with different (possibly conflicting) goals and proprietary information, then a multi-agent system is needed to handle their interactions. Even if each organization wants to model its internal affairs with a single system, the organizations will not give authority to any single person to build a system that represents them all: the different organizations will need their own systems that reflect their capabilities and priorities.
Paradigm: Consider a manufacturing scenario in which company X produces tires, but subcontracts the production of lug-nuts to company Y. In order to build a single system to automate (certain aspects of) the production process, the internals of both companies X and Y must be modeled. However, neither company is likely to want to relinquish information and/or control to a system designer representing the other company. Perhaps with just two companies involved, an agreement could be reached, but with several companies involved, MAS is necessary. The only feasible solution is to allow the various companies to create their own agents that accurately represent their goals and interests. They must then be combined into a multi-agent system with the aid of some of the techniques described in this article [20].
Even in domains that could conceivably use systems that are not distributed, there are several possible reasons to use MAS. Having multiple agents could speed up a system's operation by providing a method for parallel computation. Furthermore, the parallelism of MAS can help deal with limitations imposed by time-bounded reasoning requirements. While parallelism is achieved by assigning different tasks or abilities to different agents, robustness is a benefit of multi-agent systems that have redundant agents. If control and responsibilities are sufficiently shared among different agents, the system can tolerate failures by one or more of the agents. Another benefit of multi-agent systems is their scalability. Since they are inherently modular, it should be easier to add new agents to a multi-agent system than it is to add new capabilities to a monolithic system. Systems whose capabilities and parameters are likely to need to change over time or across agents can also benefit from this advantage of MAS. From a programmer's perspective the modularity of multi-agent systems can lead to simpler programming. Rather than tackling the whole task with a centralized agent, programmers can identify subtasks and assign control of those subtasks to different agents.
Thus, when the choice is between using a multi-agent system or a single-agent system, MAS is often the simpler option [4]. [18]

MAS Taxonomy
Several taxonomies have been presented 4  heterogeneous non-communicating agents; and heterogeneous communicating agents) are presented in order of increasing complexity and power. The multi-agent scenarios along with the issues that arise therein are summarized in Table 2.  4 For the related field of Distributed Artificial Intelligence (DAI) 5 The dimensions are divided into agent and system characteristics [19]

From Single-2-Multi-Agent Systems
Although it might seem that single-agent systems should be simpler than multi-agent systems, when dealing with a fixed, complex task, the opposite is often the case (see Figure 5). Thus centralized, single-agent systems belong at the end of the progression from simple to complex multi-agent systems. The agent in a single-agent system models itself the environment, the interactions and of course the agent itself is part of the environment, because agents are considered to have extra-environmental components as well. They are independent entities with their own goals, actions, and domain knowledge (AI) (see Figure 6). On the other side, multi-agent systems differ from single-agent systems in several ways, and agents model a plethora of goals and multiple actions without excluding a direct interaction among agents (communication) 6 . From an individual agent's perspective, multiagent systems differ from single-agent systems most significantly in that the environment's dynamics can be determined by other agents. In addition to the uncertainty that may be inherent in the domain, other agents intentionally affect the environment in unpredictable ways. Thus, all multiagent systems can be viewed as having dynamic environments. Figure 6 illustrates the view that each agent is both part of the environment and modeled as a separate entity. There may be any number of agents, with different degrees of heterogeneity and with or without the ability to communicate directly. 6 This interaction could be viewed as environmental stimuli [20]

Homogeneous Non-Communicating (HoNC) MAS
The simplest multi-agent scenario involves homogeneous non-communicating (HONC) agents. In this scenario, all of the agents have the same internal structure including goals, domain knowledge, and possible actions. They also have the same procedure for selecting among their actions. The only differences among agents are their sensory inputs and the actual actions they take: they are situated differently in the world.

HoNC Multi-Agent Goal
In the homogeneous non-communicating version of the pursuit domain, although the agents have identical capabilities and decision procedures, they may have limited information about each other's internal state and sensory inputs. Thus they may not be able to predict each other's actions. The pursuit domain with homogeneous agents is illustrated in Figure 7. Within this framework, Stephens and Merx propose a simple heuristic behavior for each agent that is based on local information [8]. They define capture positions as the four positions adjacent to the prey. They then propose a "local" strategy whereby each predator agent determines the capture position to which it is closest and moves towards that position.
The predators cannot see each other, so they cannot aim at different capture positions. Of [21] course a problem with this heuristic is that two or more predators may move towards the same capture position, blocking each other as they approach 7 .
Since the predators are identical, they can easily predict each other's actions given knowledge of each other's sensory input. Vidal and Durfee analyze such a situation using the Recursive Modeling Method (RMM) [9]. Levy and Rosenschein use a game theoretical approach to the pursuit domain [10]. Korf also takes the approach that each agent should try to greedily maximize its own local utility [11] (optimization). As additional support for the theory that much coordination and cooperation in both natural and man-made systems can be viewed as an emergent property of the interaction of agents maximizing their particular utility functions in the presence of environmental constraints.
However, whether or not altruism occurs, there is certainly some use for benevolent agents in MAS and Haynes and Sen show that Korf's heuristics do not work for certain instantiations of the domain [12]. The general multi-agent scenario with homogeneous agents is illustrated in Figure 8. There are several different agents with identical structure (sensors, effectors, domain knowledge, and decision functions), but they have different sensor input and effector output; a necessary condition for MAS 8 .

Open Issues, Techniques and Perspectives
Even in this simplest of multi-agent scenarios, there are several issues with which to deal and the techniques provided below are representative ways to deal with these issues.
 Reactive vs. Deliberative agents: When designing any agent-based system, it is important to determine how sophisticated the agents' reasoning will be. Reactive agents simply retrieve pre-set behaviors similar to reflexes without maintaining any internal state-Balch and Arkin [13]. On the other hand, deliberative agents behave more like they are thinking, by searching through a space of behaviors, maintaining internal state, and predicting the effects of actions -Levy and Rosenschein [10]. In other cases, the red thin line between reactive and deliberative agents can be somewhat blurry, an agent with no internal state is certainly reactive, and one which bases its actions on the predicted actions of other agents is deliberative -Rao and Georgeff's [14]. [22]  Local or global perspective: Another issue to consider when building a multi-agent system is how much sensor information should be available to the agents (global perspective of the environment vs. limitations to local views) [15]. Thus, a better performance by agents with less knowledge is occasionally summarized by the cliche "Ignorance is Bliss" [16]. with which agents do not model each other as agents [17]. Instead they consider each other as parts of the environment and affect each other's policies only as sensed objects.
 Beyond Advantages: Inspired by the concept of stigmergy, an agent may try to learn to take actions that will not directly help it in its current situation, but that may allow other similar agents to be more effective in the future. Typical RL situations with delayed reward encourage agents to learn to achieve their goals directly by propagating local reinforcement back to past states and actions [18].

Heterogeneous Non-Communicating (HeNC) MAS
Adding the possibility of agents acting in a heterogeneous non-communicating (HeNC) multiagent domain is of great potential power at the dimension of complexity and the aspect whether agents can be benevolent or competitive.

HeNC Multi-Agent Goal
In the HeNC multi-agent scenario the predators are controlled by separate agents and their goals, actions and domain knowledge may differ. In addition, the prey, which inherently has goals different from those of the predators, can now be modeled as an agent ( Figure 9) -Haynes et al. [19] [23] The general multi-agent scenario with heterogeneous non-communicating agents is depicted in Figure 10. The agents are situated differently in the environment which causes them to have different sensory inputs and necessitates their taking different actions. However in this scenario, the agents have much more significant differences. They may have different goals, actions, and/or domain knowledge. This condition of heterogeneity among agents adds a great deal of power for the system designer.

Open Issues, Techniques and Perspectives
Even without communication, numerous issues that were not present in the homogeneous agent scenario arise in this scenario. These issues and existing techniques along with further learning opportunities are summarized below.
 Benevolence vs. competitiveness: When designing a multi-agent system it is important to consider the plethora of different agents that will be benevolent or competitive. Even if they have different goals, the agents can be benevolent to achieve their respective goals [20] or the agents may be selfish and only consider their own goals.
In some extreme scenarios, the agents may be involved in a zero-sum situation and actively oppose other agents' goals in order to achieve their own. Korf used greedy agents that minimized their own distance to the target/goarl [11], and similarly, Levy and [24] Rosenschein used Game Theory to study how agents can cooperate despite maximizing their own utilities [10]. Ridley provides a detailed chronicle and explanation of apparent use cooperative co-evolution -Haynes and Sen [23] and Grefenstette and Daley [24]. One problem to contend with in competitive rather than cooperative co-evolution is the possibility of an escalating "arms race" with no end. Competing agents might continually adapt to each other in more and more specialized ways, never stabilizing at a good behavior. Of course in a dynamic environment, it may not be feasible or even desirable to evolve a stable behavior. Another issue in competitive co-evolution is the credit/blame assignment problem. When performance of an agent improves, it is not necessarily clear whether the improvement is due to an improvement in that agent's behavior or a negative change in the opponent's behavior. Similarly, if an agent's performance gets worse, the blame or credit could belong to that agent or to the opponent. One way to deal with the credit/blame problem is to fix one agent while evolving the other and then switch. Of course this method encourages the arms race more than ever. Nevertheless, Rosin and Belew use this technique, along with an interesting method for maintaining diversity in genetic populations, to evolve agents that can play TicTacToe, Nim, and a simple version of Go [25].
 Resource management: Heterogeneous agents may have interdependent actions due to limited resources needed by several of the agents. Example domains include network traffic problems in which several different agents must send information through the [25] same network; and load balancing in which several computer processes or users have a limited amount of computing power to share among them. Designers of multiagent systems with limited resources must decide how the agents will share the resources -Braess' paradox [26]. Braess' paradox is the phenomenon of adding more resources to a network but getting worse performance.
 Beyond Advantages: When several different agents are evolving at the same time, changes in an agent's behavior or due to the behavior of other agents can be observed. Yet if agents are to evolve effectively, they must have a reasonable idea of whether a given change in behavior is beneficial or detrimental. Methods of objective fitness measurement are also needed for testing various evolution techniques. In competitive (especially zerosum) situations, it is difficult to provide adequate performance measurements over time.
In a dynamic environment, these flexible agents are more effective if they can switch roles dynamically. Dynamic role assumption is a particularly good opportunity for ML researchers in MAS.

Heterogeneous Communicating (HEC) MAS
Even though heterogeneous multi-agent systems can be very complex and powerful, the full power of MAS is unleashed when adding the ability for agents to communicate with one another. By sending their sensor inputs to and receiving their commands from one agent, all the other agents can surrender control to that single agent. Thus communicating heterogeneous agents can span the full range of complexity in extra-environmental agent systems.

HEC Multi-Agent Goal
Communication creates new possibilities for agent behavior as they exchange information in order to help them capture the target more effectively as illustrated in Figure 11. The continuum of complexity leading into the multi-agent scenario as it appears in Figure 12. In this scenario, we allow the agents to be heterogeneous to any degree from homogeneity to full heterogeneity. The key addition is the ability for agents to transmit information directly to each other. Tan [27] uses communicating agents in the pursuit domain to conduct some    [29]. When the agents stop making sufficient progress toward the target using one strategy, they should move to the next most expensive strategy.

Open Issues, Techniques and Perspectives
The issue of benevolence vs. competitiveness as already discussed in the previous subsections, becomes more complicated in this context. The summaries of issues are described as found on Table 7.

Conclusions
This chapter is presenting an overall description of the field of Multi-Agent Systems (MAS) serving as an introduction for people unfamiliar with the field to system engineers. In this chapter a series of three increasingly complex and powerful scenarios are presented. The first scenario involves systems with homogeneous non-communicating agents, the second scenario involves heterogeneous non-communicating agents and the final involves, the general MAS scenario involves communicating agents with any degree of heterogeneity. [28]

Graph Searching Methods and Swarm Intelligence
HE NOTION of search is computation inside the agent and it is not only different from searching in the world but it is also different from searching the web, which involves searching for information [1]. The idea of search in this chapter is straightforward meaning an internal representation for a path to a goal. More specifically, the agent constructs a set of potential partial solutions to a problem that can be checked to see if they truly are solutions or if they could lead to solutions. Search proceeds by repeatedly selecting a partial solution, stopping if it is a path to a goal, and otherwise extending it by one more arc in all possible ways [2]. Therefore, search underlies much of artificial intelligence and strategies for providing the best solution to a specific problem.

Introduction
The formulation of intelligent actions [3] is in terms of state space that contains all of the necessary information to predict the effects of an action and then determine the goal state if  the agent has perfect knowledge of the state space and can observe it;  the agent has a set of actions that have known deterministic effects;  the agent can recognize a goal state;  the agent will perform the sequence of actions to get from current state to goal state.
In addition, a state-space problem is consisted of:  a set of states and a distinguished set of states (aka start states);  a set of actions available to the agent in each state and an action function;  a set of goal states (Boolean function) and the criterion that specifies the quality of an acceptable solution (optimal solution and minimal total cost).

Generic Graph Searching Mechanism
Therefore, many problem-solving tasks can be transformed into the problem of finding a path in a graph [3]. In this sub-section, a search mechanism in terms of paths in directed graphs, to solve a problem is presented [4]. Searching in graphs provides an appropriate level of abstraction to study simple or complex problems of a particular domain. More particularly, a (directed) graph is a set of nodes and a set of directed arcs between nodes. The idea is to find a path along these arcs from a start node to a goal node thus the abstraction is necessary to represent a problem as a graph. A directed graph is consisted of:  a set N of nodes and  a set A of ordered pairs of nodes called arcs.
T [32] A node can be anything as there can be infinitely many nodes and arcs. The arc n1,n2 is an outgoing arc from n1 and an incoming arc to n2. A node n2 is a neighbor 10 of n1 if there is an arc from n1 to n2; that is, if n1,n2 ∈A. A path from node s to node g is a sequence of nodes n0, n1,..., nk such that s=n0, g=nk, and ni-1,ni ∈A; that is, there is an arc from ni- 1 to ni for each i. Sometimes it is useful to view a path as the sequence of arcs, no,n1 , n1,n2 ,..., nk-1,nk , or a sequence of labels of these arcs. A cycle is a nonempty path such that the end node is the same as the start node -that is, a cycle is a path n0, n1,..., nk such that n0=nk and k≠0. A directed graph without any cycles is called a directed acyclic graph (DAG). A tree is a DAG where there is one node with no incoming arcs and every other node has exactly one incoming arc. Consider the problem of the delivery ship finding a path from location o103 to locationr123 in the domain where the interesting locations are named.  In figure 13, each arc is shown with the associated cost of getting from one location to the If o103 were a start node and r123 were a goal node, each of these three paths would be a solution to the graph-searching problem.

Generic Graph Searching Algorithm
Nevertheless, in many problems the search graph is not given explicitly; it is dynamically constructed as needed. All that is required for the search algorithms is a way to generate the neighbors of a node and to determine if a node is a goal node. The forward branching factor of a node is the number of arcs leaving the node. The backward branching factor of a node is the number of arcs entering the node. These factors provide measures of the complexity of graphs. When having time and space complexity of the search algorithms, we assume that the branching factors are bounded from above by a constant. The intuitive idea behind the generic search algorithm (figure 14), given a graph, a set of start nodes, and a set of goal nodes, is to incrementally explore paths from the start nodes by maintaining a frontier (or fringe) of paths from the start node that have been explored. The frontier contains all of the paths that could form initial segments of paths from a start node to a goal node.

Search-Strategy Paradigms
A problem determines the graph and the goal but not which path to select from the frontier.
The search strategy specifies which paths are selected from the frontier by modifying how the selection of paths in the frontier is implemented.

Depth-First Search Method
The first strategy is depth-first search ( figure 15) [9]. The elements are added to the stack one at a time and taken off the frontier at any time is the last element that was added.
Implementing the frontier as a stack results in paths in a depth-first manner means searching one path to its completion before trying an alternative path. This method involves backtracking; the algorithm selects a first alternative at each node, and it backtracks to the next alternative when it has pursued all of the paths from the first selection. Some paths may be infinite when the graph has cycles or infinitely many nodes, in which case a depthfirst search may never stop. Because depth-first search is sensitive to the order in which the neighbors are added to the frontier, the ordering can be done either statically (so that the order of the neighbors is fixed) or dynamically (where the ordering of the neighbors depends on the goal). Nevertheless, the depth-first search algorithm does not specify the order in which the neighbors are added to the stack that represents the frontier, thus, the efficiency of the algorithm is extremely sensitive to the selected ordering. Depth-first search mechanism is an appropriate strategy when space is restricted/limited, many solutions with long path lengths exist or/and the order of the neighbors' node added to the stack can be fine-tuned so that solutions are found on the first try. It is not applied when infinite paths might exist or solutions exist at shallow depth.
In addition, the depth-first search mechanism is the basis for a number of other algorithms, such as iterative deepening.

Breadth-First Search Method
In breadth-first search (figure 16) [10], frontiers are implemented as FIFO (first-in, first-out) queues, however it is not used quite often because of its space complexity. This approach implies that the paths from the start node are generated in order of the number of arcs in the path. One of the paths with the fewest arcs is selected at each stage. Breadth-first search is used if space is not a problem, searching to find the solution containing the fewest arcs and in cases where few solutions exist (at least one has a short path length) or infinite paths may exist. It is not recommended when all solutions have a long path length or there is some heuristic knowledge available.

Lowest-Cost-First Search Method
When a non-unit cost is associated with arcs, in order to find the solution, we search for the solution that minimizes the total cost of the path [11] [12]. Nevertheless, the previously referred search algorithms are not guaranteed to find the minimum-cost paths. The simplest search method that is guaranteed to find a minimum cost path is similar to breadth-first search; however, instead of expanding a path with the fewest number of arcs, it selects a path with the minimum cost. This is implemented by treating the frontier as a priority queue ordered by the cost function. If the costs of the arcs are bounded below by a positive constant and the branching factor is finite, the lowest-cost-first search is guaranteed to find an optimal solution if exists. The bounded arc cost is used to guarantee the lowest-cost search will find an optimal solution. Without such a bound there can be infinite paths with a finite cost. For example, there could be nodes n0,n1,n2,... with an arc ni-1,ni for each i>0 with cost 1/2 i . Infinitely many paths of the form n0,n1,n2,...,nk exist, all of which have a cost of less than 1. If there is an arc from n0 to a goal node with a cost greater than or equal to 1, it will never be selected. This is the basis of Zeno's paradoxes that Aristotle wrote about more than 2,300 years ago. Like breadth-first search, lowest-cost-first search is typically exponential in both space and time.

Heuristic Search Strategies
A number of refinements can be made to the preceding strategies (i.e. cycle checking, multiple-path pruning, iterative deepening, branch and bounds and bidirectional, islanddriven search). Nevertheless, dynamic programming can be used for path finding and for constructing heuristic functions [13] [14]. All of the search methods in the preceding subsection did not take into account the goal [15][16] [17]. One form of heuristic information about which nodes seem the most promising is a heuristic function h(n), which takes a node and returns a non-negative real number that is an estimate of the path cost from a node to a goal node. The heuristic function is a way to inform the search about the direction to a goal. A standard method to derive a heuristic function is to solve a simpler problem and to use the actual cost in the simplified problem as the heuristic function of the original problem [18][19] [20]. A simple use of a heuristic function is to order the neighbors that are added to the stack representing the frontier in depth-first search. This search chooses the locally best path, but it explores all paths from the selected path before it selects another path. Although it is often used, it suffers from the problems of depth-fist search. Another way to use a heuristic function is to always select a path on the frontier with the lowest heuristic value (best-first search). It usually does not work very well; it can follow paths that look promising because they are close to the goal, but the costs of the paths may keep increasing. [37]

FIGURE 17: A graph that is bad for best-first search
Consider the graph shown in figure 17, where the cost of an arc is its length. The aim is to find the shortest path from s to g. Suppose the Euclidean distance to the goal g is used as the heuristic function. A heuristic depth-first search will select the node below s and will never terminate. Similarly, because all of the nodes below s look good, a best-first search will cycle between them, never trying an alternate route from s. Table 3 gives a summary of the searching strategies presented so far.

Other (i.e. A * )
Minimal cost(p)+h(p) Yes Exponential 11 "Halts?" means "Is the method guaranteed to halt if there is a path to a goal on a (possibly infinite) graph with a finite number of neighbors for each node and where the arc costs have a positive lower bound?" Those search strategies where the answer is "Yes" have worst-case time complexity which increases exponentially with the size of the path length. Those algorithms that are not guaranteed to halt have infinite worst-case time complexity. Space refers to the space complexity, which is either "Linear" in the path length or "Exponential" in the path length. [38]

Combinatorial Optimization
A large number of well-known numerical combinatorial programming, linear programming (LP), and nonlinear programming (NLP) based algorithms are applied to solve a variety of optimization problems. In small and simple models, these algorithms were always successful in determining the global optimum. But in reality, many optimization problems are complex and complicated to solve using algorithms based on LP and NLP methods. Combinatorial optimization can be defined as the mathematical study of finding an optimal arrangement, grouping, ordering, or selection of discrete objects usually finite in number [24]. A combinatory optimization problem can be either simple of complex [25]. We call the problem simple if we can develop an efficient algorithm to solve for optimality in a polynomial time. If an efficient algorithm does not exist to solve for optimality in a polynomial time, we call the problem complex. An optimal algorithm to compute optimality complex problems requires a large number of computational steps which grows exponentially with the problem size. The computational drawback of such algorithms for complex problems requires the development of metaheuristic algorithms to obtain a (near) optimal solution [26].
Combinatorial optimization is such a topic that consists of finding an optimal object from a finite set of objects, in cases where exhaustive search is not feasible [27]. It is a subset of mathematical optimization that is related to operations research, algorithm theory, and computational complexity theory. Combinatorial optimization operates on the domain of those optimization problems, in which the set of feasible solutions is discrete or can be reduced to discrete, and in which the goal is to find the best solution [28]. Known paradigms involving combinatorial optimization are the traveling salesman problem ("TSP") and the minimum spanning tree problem ("MST"). Moreover, a number of applications in several fields, including artificial intelligence, machine learning, mathematics, auction theory, and software engineering [30].

Swarm Intelligence
A single ant or bee isn't smart, but their colonies are. A colony can solve problems unthinkable for individual ants, such as finding the shortest path to the best food source, allocating workers to different tasks, or defending a territory from neighbors. As individuals, ants might be tiny dummies, but as colonies they respond quickly and effectively to their environment. They do it with something called swarm intelligence. The collective abilities of such animals, none of which grasps the big picture, but each of which contributes to the group's success, seem miraculous even to the biologists who know them best. Yet during the past few decades, researchers have come up with intriguing insights. One key to an ant [39] colony, for example, is that no one's in charge. No generals command ant warriors. No managers boss ant workers. The queen plays no role except to lay eggs. Even with half a million ants, a colony functions just fine with no management at all-at least none that we would recognize. It relies instead upon countless interactions between individual ants, each of which is following simple rules of thumb. Scientists describe such a system as selforganizing and swarm intelligence is providing insights that help humans to understand better and manage such complex systems and manage complex systems. In other words, swarm intelligence (SI) is the collective behavior of decentralized, self-organized systems, natural or artificial as the concept is employed within the field of artificial intelligence 12

Particle Swarm Optimization
Particle swarm optimization (PSO) is an optimization algorithm for dealing with problems in which a best solution can be represented as a point or surface in an n-dimensional space.
Hypotheses are plotted in this space and seeded with an initial velocity, as well as a

Ant Colony Optimization
Ant colony optimization (ACO) is a meta-heuristic algorithm modeled on the actions of an ant colony. ACO is a probabilistic technique useful in problems that deal with finding better paths through graphs. Artificial 'ants'-simulation agents-locate optimal solutions by 12 The expression was introduced by Gerardo Beni and Jing Wang in 1989, in the context of cellular robotic systems [40] moving through a parameter space representing all possible solutions. Natural ants lay down pheromones directing each other to resources while exploring their environment. The simulated 'ants' similarly record their positions and the quality of their solutions, so that in later simulation iterations more ants locate better solutions [35][36] [37]. An extended analysis of the metaheuristic algorithm of Ant Colony Optimization algorithm is described as found on the next section of this chapter.

Artificial bee colony algorithm
Artificial bee colony algorithm (ABC) is a meta-heuristic algorithm simulates the foraging behavior of honey bees. The ABC algorithm has three phases: employed bee, onlooker bee and scout bee. In the employed bee and the onlooker bee phases, bees exploit the sources by local searches in the neighborhood of the solutions selected based on deterministic selection in the employed bee phase and the probabilistic in the onlooker bee phase. In the scout bee phase which is an analogy of abandoning exhausted food sources in the foraging process, solutions that are not beneficial anymore for search progress are abandoned, and new solutions are inserted instead of them to explore new regions in the search space. The algorithm has a well-balanced exploration and exploitation ability.

Artificial immune systems
Artificial immune systems (AIS) concerns the usage of abstract structure and function of the immune system to computational systems, and investigating the application of these systems towards solving computational problems from mathematics, engineering, and information technology. AIS is a sub-field of the biologically inspired computing, and natural computation, with interests in Machine Learning of the Artificial Intelligence research field.

Bat algorithm
Bat Algorithm (BA) is a swarm-intelligence-based algorithm, inspired by the echolocation behavior of bats. BA is automatically balances exploration (long-range jumps around the global search space to avoid getting stuck around one local maxima) with exploitation (searching in more detail around known good solutions to find local maxima) by controlling loudness and pulse emission rates of simulated bats in the multi-dimensional search space. [41]

Conclusion
Metaheuristics are strategies that guide a search process which explore the search space to find a (near-) optimal solution. A metaheuristic strategy is generally applied to problems classified as NP-Complete but could also be applied to other combinatorial optimization problems. Metaheuristics are among the best known methods for a good enough and cheap Tabu search, memetic algorithms, ant colony optimization, particle swarm optimization, etc.
The following sub-sections provide a detailed overview of Particle Swarm and Ant Colony Optimization, which are the main topic of interest to this dissertation as presented in the next chapters.

Particle Swarm Optimization Overview
In computer science, particle swarm optimization (PSO) is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. It solves a problem by having a population of candidate solutions, here dubbed particles, and moving these particles around in the search-space according to simple mathematical formulae over the particle's position and velocity. Each particle's movement is influenced by its local best known position but, is also guided toward the best known positions in the search-space, which are updated as better positions are found by other particles. This is expected to move the swarm toward the best solutions.
PSO is originally attributed to for simulating social behavior as a stylized representation of the movement of organisms in a bird flock or fish school. The algorithm was simplified and it was observed to be performing optimization. PSO is a meta-heuristic as it makes few or no assumptions about the problem being optimized and can search very large spaces of candidate solutions. However, meta-heuristics such as PSO do not guarantee an optimal solution is ever found. More specifically, PSO does not use the gradient of the problem being optimized, which means PSO does not require that the optimization problem be differentiable as is required by classic optimization methods such as gradient descent and quasi-newton methods.
A basic variant of the PSO algorithm works by having a population (called a swarm) of candidate solutions (called particles). These particles are moved around in the searchspace according to a few simple formulae. The movements of the particles are guided by their own best known position in the search-space as well as the entire swarm's best known position. When improved positions are being discovered these will then come to guide the movements of the swarm. The process is repeated and by doing so it is hoped, but not guaranteed, that a satisfactory solution will eventually be discovered. The choice of PSO parameters can have a large impact on optimization performance. Selecting PSO parameters that yield good performance has therefore been the subject of much research. The PSO parameters can also be tuned by using another overlaying optimizer, a concept known as meta-optimization and they can be tuned for various optimization scenarios [31][32] [34].

Ant Colony Optimization (ACO) is a metaheuristic approach proposed by Dorigo in 1992 to
solve combinatory optimization problems. Inspired by the behavior of ants forming pheromone (e.g., a trace of a chemical substance that can be smelled by other) trails ( figure   18) in search of food, ACO belongs to a class of algorithms which can be used to obtain good enough solutions in reasonable computational time for combinatory optimization problems [38][39] [40]. Ants communicate with one another by depositing pheromones. Initially in search of food, ants wander randomly and upon finding a food source, return to their colony.
On their way back to the colony, they deposit pheromones on the trail. Other ants then tend to follow this pheromone trail to the food source and on their way back may either take a new trail, which might be shorter or longer than the previous trail, or would come back along the previous laid pheromone trail. Also, on their way back, the other ants deposit pheromones on the trail. Pheromones have a tendency to evaporate with time. Hence, over a period of time, the shortest trail (path) from the food source to the colony would become more attractive and have a larger amount of pheromone deposited as compared with other trails. Initially, a single ant, called "blitz," goes from the colony to the food source via the blue pheromone trail.
As time progresses, more and more ants either follow this blue trail or form their own shorter trail (red and orange trail). Eventually, the shortest trail (red) becomes more attractive and is taken by all the ants from the colony to the food source and the other trails evaporate in a period of time [35][36] [37].
[43] In the present case study; a set of 13 ports of the Aegean Sea (including a depot port) is taken into paramount consideration.

Motivation and Contribution
The Vehicle Routine Problem (VPR) , which was first introduced by Dantzig and Ramser [1] , is a NP-hard combinatorial optimization problem . Recently, metaheuristic techniques, such as tabu search [2] , simulated annealing [3] and genetic algorithm [4] has been investigated.
The last few years there has been a great interest in algorithms inspired by observation of natural phenomena. In particular in this paper, Particle Swarm Optimization (PSO) algorithm [5,6,7] , which was is investigated. The inspiration and the basic idea of PSO algorithm is the way a flock of birds behaves during the search for food. There is always a bird in the flock that can smell the food very well and better from the rest of the folk. Because the birds of a folk communicate with each other and transmit information, especially the good information, the birds will eventually flock to the place where food can be found. In currently presented study case, we apply PSO algorithm in the VPR that is given as a set of 13 ports (including a depot port) of the Aegean Sea in Greece [8].

The Vehicle Routing Problem (VRP)
The VPR [9] is a difficult Combinatorial Optimization Problem and can be described as follows: "A set of customers are to be serviced by a fleet of vehicles from a central depot. The locations of customers and the depot is given. Each vehicle has a limit capacity to carry goods and delivering or pick up goods from the customers. In the classic Vehicle Routing Problem we consider that all vehicles have the same capacity and goods are picked up from the customers." The VPR , from a mathematical point of view [8] , can be expressed via a complete weighted graph , , where 0,1,2 … , is a set of nodes and , | , ∈ is a set of edges , and is the depot.

A [47]
If we consider that the vehicle has unlimited capacity and is used to service a set of customers, then the problem reduces to a Travelling Salesman Problem (TSP). Thus, the VPR is the Θ-TSP problem. In Θ-TSP, the Θ-Salesman has to cover the given cities-nodes and each city-node must visited by exactly one salesman. In VPR, also the number of vehicles, Θ, is often considered as a minimization criterion in addition to total travelled distance. Generally, in the VPR three basic objectives can be distinguished:  Minimize the number of vehicles used  Minimize the total distance or time travelled  Minimize a combination of number of vehicle used and total distance travelled.

Particle Swarm Optimization (PSO)
The Particle Swarm Optimization (PSO) algorithm [5,6,7] simulates the behavior of flock of birds . The algorithm uses a number of entities (particles) , where each entity represents a possible solution to the optimization problem. The basic characteristics of each particle are:  : the current position of particle  : the current velocity of particle  : the best position of particle (personal best position) The variable is the best position in which the entity has been found, and therefore the best solution for objective function. The algorithm is initialized by a set of random particles (solutions) and then searches for the optimal by updating generations. In each iteration, each particle is updated by two best prices (if they achieved).
The first is particle's best solution achieved so far. This value is called pbest. The other "best" value is the "best" value that has been found so far by any particle of folk. After found the two best values in each particle updates its velocity(1) and position(2) by the following equations :

(2)
Where:  ∶ the dimension  t : the number of iteration  : is the velocity of particle (in dimension j if we have more than one dimensions) at time t.


: is the position of particle (in dimension j if we have more than one dimensions) at time t. [48]

PSO-2-VRP Application (encapsulation)
In the present sub-section the application of the Particle Swarm Optimization to the Vehicle Routing Problem is described in fully details.

Problem Formation
A subset of the Aegean Sea's islands is given in table 1. The numbers in cells of the Table 1 represent distance in nautical miles [10]. Every island has a specific demand for goods, provided by the available shipping fleet, which carries them from Piraeus port, designated as depot of every possible route to the islands. The goods are placed in small containers. For each route there is a corresponding traversal cost which is proportional to the distance between the islands the route connects.
Our goal is -under known supply and demand constraints -to minimize the total fuel costs and port dues, which expressed by the problem objective function: Where:  : is the number of ships travelling from island to island .


: is the distance of islands (miles)  : is the fuel's cost consumed per mile (in € per mile) [49]  : is the fee of port (in € ) For reasons of simplicity, the return cost of each ship after a successful route completion, is not included. The constraints of the optimization problem are [7]: Demand must be satisfied at all islands.
 : is the number of containers transported from island to island.
 : is the demand at the port.

Assumptions:
 Every particle (ship) start its route from a specified island (depot -Piraeus port) transporting cargo equals to their maximum carrying capacity.
 On every visit, the demand of the port for containers should be satisfied or all cargo should be unloaded.
 If a port's demand has been fulfilled, a ship can ignore it ( but can use it as intermediate island/port, if the total cost is less )  Every ship with no cargo left, returns to the depot (the return cost is not included, based on the problem definition). [50] The PSO Algorithm:  Initialization of position and velocity for each particle (ship).
 Calculate initial cost.
 Define and calculate .
 Set iteration count=1  For every island j-1 to island j, given that : 1. The costal route (j-1,j) exists, 2. Demand at island j has not been fulfilled, 3. Island j , is not revisited by ship i (if condition (2)   Update pbest and gbest.
 Continue iterations until gbest became pbest for all particles/ships.

Computational Experience
We now apply our problem formulation to the set of the 13 islands of the Aegean Sea, as shown in Table 4. Numbers on the cells of the Table 4 represents distance in nautical miles.
The depot is the port of Piraeus.  The port demand and supply are shown in Table 5 [10]. The final results (best routes for minimal cost, number of ships for each route and number of containers) are shown in Table 5b. Concluding, the graphical representation of the objective function optimization is shown in Figure 19.

Conclusions
The Particle Swarm Optimization (PSO) algorithm was applied in the vehicle routing problem. Particularly algorithm PSO was applied in a group of 13 islands of the Aegean Sea.
Key Features of the PSO algorithm is that for the renewing of the particle's position, in each iteration of the algorithm, the particle uses two values. One value is the best position found during the execution of the algorithm (pbest), and the other is the best position found by any

Motivation and Contribution
The

Problem Formulation
The where x i is the coordinate x for the customer i; and y i is the coordinate y for customer i.
The VSP consist be design of m vehicle routes on G such as:  each customer is visited only once,  the total demand of any route, does not exceed the vehicle capacity q,  the length of any route does not exceed a pre-set maximal route length L,  in some version m is fixed a priori in other is a decion variable,  the total cost of all vehicle routes is minimized.
The VSP is more complex than the traveling salesman problem (TSP) as every route in VSP is as TSP. Generally, if we have k vehicles in a VSP, then in order to get a best set of routes for the k vehicles the k number of TSP have to be solved.

The Mathematical Problem
In ACO algorithms an ant represent a vehicle. We concentrate on the way of choosing the next node by every ant. This selection is related with the pheromone quantity that has been displayed on every arc and also with the traverse cost of this arc. There are two possible actions: firstly an ant may follow, either it will choose with an absolute way the arc with the best efficiency (highest pheromone quantity low cost) and secondary an ant will choose according to a probability function between the candidate nodes. In the first option an ant k at the current position of the node i choose to move the next node j applying the state transition rule given by the following equation (Bonabeau et al. 1999).
where   ij r t is the amount of pheromone on arc   i,j at time t, n ij is the inverse of the distance between nodes i and j, called visibility, β is a parameter which controls the relative weight of the visibility, and J is a node drawn by using the probabilities: is the set of the nodes to which ant k can move when being located in node i. Every ant will choose one of the two ways depending on a second probability function, which in the current state is zero for the first case and equal to one for the second case. In this paper we follow the second equation for the selection of the next arc. The equation with which the pheromone is updated in every arc after its traverse is: , is the pheromone decay coefficient, and 0 r is the initial amount of pheromone on arc   i, j . When a cycle ends, the value of the objective function of the problem is calculated using the following equation: where:  λ ij : shows the number of traverse that took place for the arc joining nodes i and j.

-1 δ ij
: is the cost of traverse for the arc.
The value of this objective function finally, arises from adding the cost of every arc used, multiplied with the number of times that is been traversed. Building a path network with the lowest cost arises from minimizing equation 4. In the paper parameter β (beginning value β=1), and pheromone quantity by determined from the programmer user, whereas parameter ρ is by default zero which means that in this point we don't take any account of the evaporation.

Case Studies
A computer program has been constructed to demonstrate how the concepts of the previous described algorithm can be applied (into practice), test, validate and finally and optimize (fine-tuning corrections) the proposed algorithm. The results taken from this program are illustrated using the input data of the Table 6. The capacity of agent: 50 After several tests of the model, a graph is emerged ( Figure 20, which has proved many interesting findings about the efficiency of the computer program according to our example and provided a mean to access the behavior of the process in a time horizon, which is measured in the example in cycles. Initial Pheromone Level : 500 Initial Pheromone Level : 4000 Initial Pheromone Level : 50 [58] tend to stabilize or tend to some limit value. Inputting a very small value (50) as an initial pheromone quantity we saw that the value of the objective function after the end of the first cycle was lower than that of the other two cases, but a very small or rather no improvement was noticed after the end of the remaining cycles. The final value of the initial pheromone level was set at 500 and at that level we obtained the best results of the simulation process.

Conclusions
From the previous presented results we can derive to some conclusions that are not only related to the current study, but they are related in general with the Ant Colony Algorithm.
It is obvious that the quality of the results is closely related with the initial values of the simulation process. A very important factor in every algorithm implementation of the ant colony algorithm is the correct definition of the initial pheromone quantity which is going to be used for examining whether some paths are favored against others. At this point is extremely difficult (if not impossible) to find a relatively better solution, so the lowest value that has been found at that moment, is assumed to be the best.

Comparison of the Metaheuristic ACS and MMAS Algorithms: An Optimized Dartboard Design Application
HE PROBLEM of optimally locating the numbers around a dartboard is a Combinatorial Optimization problem. In this paper, we're solving this problem using Ant Colony System and Max-Min Ant System (MMAS) algorithm. The algorithm reinforces local search in neighborhood of the best solution found in each iteration while implementing methods to slow convergence and facilitate exploration. Both algorithms have been proved to be very effective in finding optimum solution to hard combinatorial optimization problems.

Motivation and Contribution
The game of darts is one of the most popular games on the whole planet. It was invented during the 16th century but the method of score calculation belongs to Brian Gamlin (1896) as cited [1] .Then many variants were invented but the most important is the one where the players try to reduce the initial score value of 301 to 0 points [1]. The most common form of the dartboard is depicted in Fig. 21. The Dartboard design [2] is related to placing the numbers to sectors of a circular board [3,4,5,6]. Optimization algorithms (ACO) [9,10,11,12,13,14], Ant System algorithm [10] and Max-Min Ant System (MMAS) algorithm [9,15], and compare the results of each one.

Problem Formulation
Players have to aim at specific points and not at the whole dartboard. The points actually gained defined the standard deviation σ . Standard deviation is reverse proportional to the player's target accuracy [3]. The sector at which the player should aim at, gives 20 points Figure 1. Let π (k) be the number which is located at k position of the dartboard, beginning from any position arbitrarily and 1 , 2 , … … , 20 be any permutation of the numbers 1,2,…,20. If a player aims at π(k) and gains 1 with probability and π (k) with probability 1 2 the expected standard deviation from aimed points is [8]:

1
The aim is the maximization of the total expected standard deviation function, when the 20 numbers constitute targets to be gained with equal probabilities. There are two objective functions that have to be maximized for the total of permutations π (3).
Since every term is calculated twice for every objective function while is a constant, the objective functions are expresses as:

AS and MMAS Algorithms Overview
Both algorithms were inspired by observation of the behavior of real ant colonies [11] and first were applied to the classic Travelling Salesman Problem (TSP) [15,16]. A set of agents , called artificial ants, cooperate to find good solutions to TSP using an indirect model of communication through pheromone trails, which they deposit on the edges of the TSP graph while constructing solutions. Most important part of both algorithms is the movement of the ants. [61]

AS and MMAS Algorithms' Common Features
 : is the heuristic information which called visibility and express the desirability of edge ( i,j) . It is defined as the inverse of the distance between and .
 :is the pheromone trail and expresses the concentration of pheromone on edge (i,j) .
 Coefficients , are the parameters which control the relative influence of pheromone trail and heuristic information and they are defined by the user.

Ant-System (AS) Algorithm Analysis
Ant System algorithm [11,12] is the first member of the family of Ant Colony Optimization algorithms. The main characteristics AS algorithm are the following:  Ant System algorithm uses a list M , known as tabu list, to store all the visited nodes.
Initially the tabu list contains only the start node. where:  Q: is a constant which representing the pheromone addition factor  L t : tour length for the kth ant

Max-Min Ant System (MMAS) Algorithm
MMAS algorithm [9,15,16,18] is a direct improvement over AS algorithm [11,12,19]. The main modification by MMAS with respect to AS follow below: 1. Only one single ant is allowed to add pheromone in each iteration, in order to exploit the best solutions found. This ant may be the one which found the best solution in the current iteration (iteration-best ant) or the one which found the best solution from the beginning of the trial (global-best ant). 3. The pheromone trails are initialized to the upper trail limit τ , which causes a higher exploitation at the start of the algorithm.
The solutions in MMAS algorithm are constructed in exactly the same way as in AS algorithm. The decision for the next edge to be followed is given by the probability: where 1 : is a parameter that models pheromone evaporation with ∈ 0,1 .
with is a solution cost of iteration-best.
When the algorithm converges, then the smoothing of pheromone trail mechanism can be activated , which increases pheromone levels depending on the differences * , before and after the smoothing scores rounded, so that the selection possibility of trails with low pheromone levels is minimized . [63] This can take place as following: * Where δ is a parameter set by the user and 0 δ 1 . For 1 , there is a reinitialization of pheromone levels, and for 0 , this mechanism becomes inactive. (This mechanism is used mainly in executions with a large number of iterations).

AS and MMAS algorithm Implementation for Optimal location
Firstly we present the assumption for each algorithm followed by the steps used ton solver our problem.

Ant System Algorithm Assumptions
20 ants were used same as the numbers population that must be located on the dartboard.
100 iterations of the algorithm were executed. The value of the parameter Q is equal to 1.

Ant System Algorithm
Step-Analysis  (3), the next number that will be located on the dartboard is evaluated for every ant.
Step 2 is repeated until all ants fulfill their trip.

 Step 3:
The cost of all trips is calculated. This is a measure expressing how much a failure will cost a player. The maximum cost is registered if it gives a better solution than the one that is already found.

 Step 4:
Pheromone is updated at all trails used by the ants.

 Step 5:
The process is repeated from Step 2 until 100 iterations are executed. [64]

Max-Min Ant Algorithm Assumptions
20 ants were used for the implementation of the program, as many as the numbers that must be located on the dartboard. 100 iterations of the algorithm were executed. The upper limit of pheromone is calculated by: τ where is the solution cost in the corresponded iteration. The lower limit of pheromone is the limit of the sequence: The parameter , , and ∈ 0,1 is determined by the user.

Max-Min Ant Algorithm
Step-Analysis  Step 2: Using the transition rule, the next number that will be located on the dartboard is evaluated for every ant.
Step 2 is repeated until all ants fulfill their trip.
 Step 3: The cost of trips is calculated. This is a measure expressing how much a failure will cost a player. The maximum cost is registered if it gives a better solution than the one that is already found.
 Step 4: Renew pheromone using the update rule , and also levels.

 Step 5:
The process is repeated from Step 2 until 100 iterations are executed.


Step 6: (Optional Step). When the algorithm seems to converge, smoothing of pheromone trail can take place. [65]

Case Studies -Optimization
Both algorithms (AS and MMAS) are implemented to solve the problem of locating the numbers on the dartboard in an optimal way, and compare their results. Each algorithm ran for 4851 reps to achieve all the possible combinations of parameters α,β,ρ. The range of parameters that was selected, is for α=0…20 with step 1, for β=0...20 with step 1 and for ρ=0...1 with step 0.1. We decide to include the extreme values of evaporation (ρ) 0 and 1, to observe the behavior and the results of both algorithms. Each algorithm's best costs for z ,z are shown in Table 7. For function z both algorithms for most combination of α,β,ρ gives constant best cost value equal to Cost=199. For function z AS gives the best solution for ρ=0 , and it can consider as special case , since this means that the pheromone is accumulate at the trails without being evaporated. As a result, these trails reinforced continuously when the others progressively excluded by the colony. Each algorithm found his 9 best solutions in most cases for more than one combination of the parameters.  2623  199  2632  2622  199  2630  2621  199  2629  2618  199  2628  2617  199  2627  2616  199  2626  2614  199  2625  2613  199  2624  2612 199 2623 For the implementation of AS and MMAS algorithms, in Tables 8 & 9 shown the multitude of combinations which gives the nine best solutions. [66] The best combinations of parameters (α,β,ρ), based on the number of iterations it took to find the nine best costs for , ,is presented in Tables 10 & 11. [67] Finally in Tables 12 &13 are presented, the nine (9) best solutions for the numbers allocation on the dartboard. Cost for z2

Conclusions
The problem is referred to the Dartboard game where 20 numbers, ranging from 1 to 20 must be located in such an order around the dartboard that the players' failure is maximized. AS and MMAS algorithms were used to solve the problem. MMAS proved to be much faster and provide better solutions. Both algorithms tested in a desktop with 8 GB ram and an Intel I7-Core Processor. We timed both algorithms for the total of 4851 combinations of α, β, ρ and AS needed sec when MMAS need. Both algorithms gives for the objective function z best value, constant and equal cost=199, independently, of the values of the parameters α, b and ρ. For the objective function z the optimum value MMAS provides a better best cost Cost=2632 when AS gives Cost=2623 (under special circumstances, when the evaporation coefficient is equal to zero). As we noticed MMAS achieve the best cost from the start, when AS. Finally it was noticed, that the best values of the cost were derived not exclusively from a particular combination of the parameters α, b and ρ, but for a multitude of their combinations.

Conclusions and Future Directions
The research goals addressed in this doctoral dissertation are related to the extensive research and development efforts are required when applying combinatorial optimization in intelligent multi-agent systems, in order to achieve cooperation, interoperability and sustainability in heterogeneous and complex existing or future designs of industrial, aerospace, robotic systems or/and other cyber-physical system.
In the first part of this Doctoral Dissertation, consisted of chapters 1, 2 and 3, the metaheuristic approaches in the area of AI algorithms and specifically the particle swarm and ant colony optimization to solve combinatorial optimization problems were presented and discussed.
In the second part, three implementations of metaheuristic algorithms are presented in details. In chapter 4, the problem of dartboard game within the spectrum of metaheuristic algorithms was described and the Ant Colony System and Max-Min Ant System algorithm as a metaheuristic strategy that guide the search process were applied in the effort to reinforce local search in neighborhood of the best solution found in each iteration. Nevertheless, any of the proposed enhanced interoperable and cooperative solutions provides a contribution towards the combinatorial problems that the industry and the research community might have highly acknowledged. As cyber-physical systems evolve, efficiency, quality, safety, security, trustworthiness and optimized solutions are major concerns for the industry, developers and researchers. [71] Since, MATALAB is one of the most common and effective tools to improve all aspects of the systems a wider deployment of multi-agent systems with optimum intelligence set them capable to provide cross border mobility would be of great challenge addressing reliability, time-sensitivity, quality and continuity. Thus, the sustainability of the entire ICT domain, including cyber-physical systems and SCADA, into the era of Internet of Things needs to be re-examined from the aspect of AI. Turing's original question "if machines can think" not only still remains, but also expands "if they can think; which types of machines should?" In addition, albeit the concept of intelligent multi-agent systems is not new, their commercial success over the recent years played a major role in the aerospace domain. Over the next years, as future global aerospace transportation systems will expand beyond known capabilities, the need of AI in resource provisioning (pooling management) will be necessary.
Therefore, the transition of the ICT era to the "post-PC" era of an intelligent cyber-space and the cyber-physical microcosm, in order to be sustained fully connected and self-optimized need to identify both technological and non-technological issues related with the evolution of the AI. Related technological issues include:  Optimization of scaled interoperability and elastic scalability, which is currently restricted due to the inefficiently implemented resource capabilities;  Efficient handling of big data due to the diversity of data, leading to consistency and efficiency issues;  Rapid designs and development simplicity in the solutions provided.
Non-technological issues include:  economic aspects which cover knowledge about when, why and how to use AI in computing systems and technology;  aspects related to green IT and "green capabilities" by reducing unnecessary power consumption  worldwide harmonized regulations of protecting the environment [72]

APPENDIX-I Vehicle Routing Problem (VRP) -Generic Approach and Algorithmic Implementation
The vehicle routing problem (VRP) is a combinatorial optimization and integer programming problem seeking to service a number of customers with a fleet of vehicles in the fields of transportation, distribution, and logistics. The Vehicle Routing Problem (VRP) is a generic name given to a whole class of problems in which a set of routes for a fleet of vehicles based at one or several depots must be determined for a number of geographically dispersed cities or customers. The objective of the VRP is to deliver a set of customers with known demands on minimum-cost vehicle routes originating and terminating at a depot. Figure 23 illustrates a typical input for a VRP problem and one of its possible outputs: Formulation: The VRP is a combinatorial problem whose ground set is the edges of a graph ${G(V,E)}$. The notation used for this problem is as follows:  ${V = \left\lbrace v_{0}, v_{1}, …, v_{n} \right\rbrace}$ is a vertex set, where:  Consider a depot to be located at ${v_0}$. Near all of them are heuristics and metaheuristics because no exact algorithm can be guaranteed to find optimal routes within reasonable computing time when the number of ports is large. This is due to the NP-Hardness of the problem.
Next we can find a classification of the solution techniques we have considered: Metaheuristics  Ant Algorithms, Constraint Programming, Deterministic Annealing, Genetic Algorithms, Simulated Annealing, Tabu Search  Granular Tabu, The adaptative memory procedure, Kelly and Xu