Abstract:
Cloud storage system architecture and design plays a vital role in the cloud computing infrastructure in order to improve the storage capacity as well as cost effectiveness. To address this need, the cost effective PC cluster based storage server is configured to be activated for large amount of data to provide cloud users and is implemented with Hadoop Distributed File System (HDFS). HDFS is open source distributed file system that designed on low cost hardware. In this system, high access latency occurs due to the access mechanism of HDFS. Data prefetching is an effective technique for improving file access performance which can reduce response time delay for I/O system. In order to solve this issue, we propose data perfetching algorithm based on FP-growth algorithm to extract user access patterns from user’s historical accesses records for reducing the communications between clients and NameNode. According to user’s access frequent patterns, frequently access data are stored in client cache.