A method for high throughput deduplication for primary file servers by using pre-fetch cache

Hitoshi Kamei, Takaki Nakamura

    Research output: Contribution to journalArticlepeer-review

    Abstract

    We propose a method of high throughput file level deduplication for primary file servers, called partial data background pre-fetch (PDBP). To achieve high throughput of deduplication, the method reduces the number of disk I/Os issued during deduplication process. Before running deduplication process, the proposed method pre-fetches a part of data of shared files referred by deduplicated files. After that, the method processes the files that are larger than a file size threshold defined by administrators. In this paper, we evaluate a deduplication processing time by using a simulation model of PDBP. Consequently, we confirm that the processing time of PDBP is reduced by about 50% compared to a conventional file deduplication method when the threshold is set to 4 KB.

    Original languageEnglish
    Pages (from-to)619-628
    Number of pages10
    JournalIEEJ Transactions on Electronics, Information and Systems
    Volume135
    Issue number6
    DOIs
    Publication statusPublished - 2015 Jun 1

    Keywords

    • File cache
    • File level deduplication
    • File system

    ASJC Scopus subject areas

    • Electrical and Electronic Engineering

    Fingerprint Dive into the research topics of 'A method for high throughput deduplication for primary file servers by using pre-fetch cache'. Together they form a unique fingerprint.

    Cite this