Περίληψη σε άλλη γλώσσα
The explosive growth of the Web has dramatically changed the way in which information is managed and accessed. In particular, nowadays, the Web has evolved rapidly from a simple information-sharing environment (offering only static text and images) to a rich framework of dynamic and interactive services (such as video/audio conferencing, e-commerce, and distance learning). This enormous growth and diversity in terms of access devices, bandwidth, information sources, and content has complicated Web data management frameworks and practices. In this context, the need of various Web data management techniques and mechanisms has become obligatory towards providing information (that is actually useful to users) and improving information circulation and dissemination over the Web. Furthermore, new tools and techniques are needed to effectively manage this data since managing Web data with conventional tools is becoming almost impossible. The contribution of the dissertation focuses on the fol ...
The explosive growth of the Web has dramatically changed the way in which information is managed and accessed. In particular, nowadays, the Web has evolved rapidly from a simple information-sharing environment (offering only static text and images) to a rich framework of dynamic and interactive services (such as video/audio conferencing, e-commerce, and distance learning). This enormous growth and diversity in terms of access devices, bandwidth, information sources, and content has complicated Web data management frameworks and practices. In this context, the need of various Web data management techniques and mechanisms has become obligatory towards providing information (that is actually useful to users) and improving information circulation and dissemination over the Web. Furthermore, new tools and techniques are needed to effectively manage this data since managing Web data with conventional tools is becoming almost impossible. The contribution of the dissertation focuses on the following subjects. Chapter 3 deals with the problem of assessing the quality of user session clusters in order to make inferences regarding the users’ navigation behaviour. Understanding users’ navigation on the Web is important towards improving the quality of information and the speed of accessing large-scale Web data sources. Clustering of users’ navigation into sessions has been proposed in order to identify patterns and similarities which are then managed in the context of Web users oriented applications (searching, e-commerce, etc). In this Chapter, a common model-based clustering algorithm is used to result in clusters of Web users’ sessions. These clusters are validated by using a statistical test, which measures the distances of the clusters’ distributions to infer their similarity. Furthermore, a visualization method is proposed in order to interpret the relation between clusters. Using real data sets, it is shown that the proposed analysis is robust and effective, uncovering valuable associations among Web users’ navigation sessions. Chapter 4 deals with the issues concerned with Web data caching and prefetching. Firstly, a new cache replacement algorithm is presented, which identifies the objects that should be evicted by considering together three important criteria: object’s frequency, recency and size. Experimentation over a synthetic workload has shown that the proposed algorithm achieves higher hit rates when compared with the most widely-used and recently-proposed algorithms. Then, a clustering-based prefetching scheme is presented where a novel clustering algorithm identifies clusters of “correlated” Web objects, with no need to determine the number of clusters in advance. This scheme can be integrated easily into a Web proxy server, improving its performance. Through a simulation environment, using real data set, it is shown that the proposed framework is robust and effective in reducing the user-perceived latency. Chapter 5 studies some crucial content management issues for the Content Distribution Networks (CDNs). In general, a CDN is a set of servers (distributed around the world), which replicate the origin servers’ content. A most important issue for a CDN is to identify the content that should be outsourced for replication to its servers. In order to address this issue, self-adaptive techniques are developed, which requires no apriori knowledge of request statistics. The clusters are identified by “correlated” Web pages in a site, called Web site communities, and make these communities the basic outsourcing unit. Through a detailed simulation environment, using both real and synthetic data, the proposed techniques are proved to be very robust and effective in reducing the user-perceived latency, performing very close to an unfeasible, off-line policy, which has full knowledge of the content popularity. Another important issue, which this Chapter is dealt with, is to identify the optimal placement of the outsourced content to CDN’s servers. Taking into account that this problem is NP complete, an heuristic method should be developed. All the approaches developed so far either take as criterion the network’s latency or the workload. In this framework, two novel techniques are presented to place the outsourced content to CDN’s servers. In the first one, the outsourced objects are placed to CDN’s servers with respect to the network latency that each object produces, whereas in the second one the objects are placed to these servers by integrating both the latency and the load. Through a detailed simulation environment, using both real and synthetic data, it is shown that the proposed methods can improve significantly the response time of requests while keeping the CDNs’ servers’ load at a very low level. Chapter 6 presents a modeling and simulation framework for CDNs, called CDNsim. CDNsim simulates in great detail the main characteristics of the CDN infrastructure model as well as the TCP/IP protocol. The purpose of the CDNsim simulation tool is to give a (closely) realistic view of a CDN environment which will be used as a testbed for CDN evaluation and experimentation. This is quite useful for both research community (to experiment with new CDN data management techniques) and CDNs developers (to evaluate profits on prior certain CDN installations). Finally, Chapter 7 concludes this dissertation and gives extensions and directions for future work.
περισσότερα