Many applications use collective I/O operations to read/write data from/to disk. One of the most used is the Two-Phase I/O technique extended by Thakur and Choudhary in ROMIO. Two-Phase I/O takes place in two phases: redistributed data exchange and an I/O phase. In the first phase, by means of communication, small file requests are grouped into larger ones. In the second phase, contiguous transfers are performed to or from the file system. Before that, Two-Phase I/O divides the file into equal contiguous parts (called File Domains (FD)), and assigns each FD to a configurable number of compute nodes, called aggregators. Each aggregator is responsible for aggregating all the data, which it maps inside its assigned FD, and for transferring the FD to or from the file system. In the default implementation of Two-Phase I/O the assignment of each aggregator (aggregator pattern) is fixed, independent of distribution of data over the processes. This fixed aggregator pattern might create a I/O bottleneck , as a consequence of the multiple requests performed to collect all data assigned to their FD.
Therefore we proposed replacing the rigid assignment of aggregators over the processes by new two different aggregation criteria:
•Aggregation-by-communication-number (ACN): This criteria assigns each aggregator to the node who has more highest number of contiguous data blocks of the file domain associated with the aggregator.
•Aggregation-by-volume-number (AVN): This criteria assigns each aggregator to the node who has more data of the file domain associated with the aggregator.
We have implemented both aggregator pattern modifying the Two_Phase I/O of MPICH2. In most of cases the modified Two-Phase I/O get a speedup between 1.2 and 1.3 with one of the aggregator pattern that we have proposed. This means, that with the appropriate aggregator pattern (ACN or AVN) the number or volume of communications is reduced. Therefore, the overall execution time is also reduced.
Now, we want to we want to develop a new intelligent I/O technique data select in run time the best aggregator pattern between ACN or AVN, depending of the characteristic of the applications.