Cees-Jan

ReactPHP: Filesystem

This week we'll take a look at the react/filesystem package I'm developing under ReactPHP's flag. It's been a great adventure this far and I want to share some things of those wicked project.

Lion King Stampede

Important note

react/filesystem isn't done yet and things can be subject to change, non the less I wanted to blog about it and show my current state of work on it. (Writing this post has been a great experience and will flow back into the projects documentation.)

Installation

Installing the filesystem package is a little more complicated them the rest of the packages as it requires ext-eio to function. ext-eio can be installed by running:

pecl install eio

After you installed the extension and added eio.so to your php.ini you can install react/filesystem using composer:

composer require react/filesystem
A word of caution

EIO uses threads to make async filesystem I/O possible and it will autoscale the number of threads needed for that. Most operating systems have an open file limit, hitting that limit will result in an error. So don't try to open everything all at once, that limit is there for very good reasons. If you decide like me to raise that limit (mine is around half a million) and you still try to open as many as possible you will crash and burn your system. The HDD/SSD/SSDD will get swamped and is unable to keep up and fulfill your requests thus crashing the system. So don't raise ever raise that limit unless you know what you are doing and willing to take the risks.

Setting up the filesystem

Before we can use the filesystem we have to we have to create it. (It is recommended to only create one as for now the effects of creating more then one are unknown.) Setting it up is as simple as running the create method on the filesystem object. That will try and create a new adapter if you don't hand one, in this case it's just the EIO adapter.

<?php

require dirname(__DIR__) . '/vendor/autoload.php';

$loop = \React\EventLoop\Factory::create();
$filesystem = \React\Filesystem\Filesystem::create($loop);
Listing directory contents

The filesystem has a only a few methods. First the dir method will create and return an object representing the given directory. (Note that the directory doesn't have to exist as you can also create it (recursively) with the directory object.) For the first example we'll list directory contents in to root of the examples project. That works by first creating the filesystem, then call the dir method and do an ls on that directory. Since this part of the API is async a promise is returned. Once everything is listed, the promise will resolve with a list of nodes. These can be both files and directories:

<?php

require 'vendor/autoload.php';

$loop = \React\EventLoop\Factory::create();
$dir = \React\Filesystem\Filesystem::create($loop)->dir(dirname(__DIR__));
$dir->ls()->then(function (\SplObjectStorage $list) {
    foreach ($list as $node) {
        echo $node->getPath(), PHP_EOL;
    }
});

$loop->run();
Listing all PHP files in the examples repo

Now lets say we want all PHP files in a directory and it's subdirectories. Instead of calling ls we'll call lsRecursive to get the entire directory tree from the given directory. Once the listing is in we'll use RegexIterator to get all files ending with .php. As you might notice we'll also filtering out the files in vendor/, that is so they don't pollute our results:

<?php

require 'vendor/autoload.php';

$loop = \React\EventLoop\Factory::create();
$dir = \React\Filesystem\Filesystem::create($loop)->dir(dirname(__DIR__));
$dir->lsRecursive()->then(function (\SplObjectStorage $list) {
    $phpFiles = new RegexIterator($list, '/.*?.php$/');
    foreach ($phpFiles as $node) {
        if (strpos($node->getPath(), 'vendor') !== false) {
            continue;
        }
        echo $node->getPath(), PHP_EOL;
    }
});

$loop->run();
Getting the size of all PHP files

We have all the PHP files in this repo, but we like to know how big the files are and what their combined size is. The file object has a size method which enables you get that information, under the hood it uses a stat call which reveals more information about an inode. You might notice the use of promise chaining, that makes it easy and clean to get all the sizes in a clean and simple way:

<?php

require 'vendor/autoload.php';

$loop = \React\EventLoop\Factory::create();
$dir = \React\Filesystem\Filesystem::create($loop)->dir(dirname(__DIR__));
$dir->lsRecursive()->then(function (\SplObjectStorage $list) {
    $phpFiles = new RegexIterator($list, '/.*?.php$/');
    $promises = [];
    foreach ($phpFiles as $node) {
        if (strpos($node->getPath(), 'vendor') !== false) {
            continue;
        }
        $file = $node;
        $promises[] = $file->size()->then(function ($size) use ($file) {
            echo $file->getPath(), ': ', number_format($size / 1024, 2), 'KB', PHP_EOL;
            return $size;
        });
    }
    \React\Promise\all($promises)->then(function ($sizes) {
        $total = 0;
        foreach ($sizes as $size) {
            $total += $size;
        }
        echo 'Total: ', number_format($total / 1024, 2), 'KB', PHP_EOL;
    });
});

$loop->run();
Size, md5 en update time

Now that we know how big the files are we also want to hash their contents, thus reading our their contents before hashing them with md5. Now that might sound simple, and from the shown API that is simple but that is just syntactic sugar around the file open and read stream calls. Another thing I've sneaked into this example is a touch call on the file object. When touch is called the file will be either created or the access time is updated. The effects of that are show in the demo.

<?php

require 'vendor/autoload.php';

$loop = \React\EventLoop\Factory::create();
$dir = \React\Filesystem\Filesystem::create($loop)->dir(dirname(__DIR__));
$dir->lsRecursive()->then(function (\SplObjectStorage $list) {
    $phpFiles = new RegexIterator($list, '/.*?.php$/');
    $promises = [];
    foreach ($phpFiles as $node) {
        if (strpos($node->getPath(), 'vendor') !== false) {
            continue;
        }
        $file = $node;
        $contents = $file->getContents()->then(function ($contents) {
            return md5($contents);
        });
        $promises[] = \React\Promise\all([$file->stat(), $contents])->then(function ($data) use ($file) {
            list ($stat, $md5) = $data;
            echo substr($file->getPath(), strlen(dirname(__DIR__)));
            echo ': ', number_format($stat['size'] / 1024, 2), 'KB, ';
            echo 'md5 hash:', $md5, ', ';
            echo 'access time: ', (new DateTime('@' . $stat['atime']))->format('r'), PHP_EOL;
            $file->touch();
            return $stat['size'];
        });
    }
    \React\Promise\all($promises)->then(function ($sizes) {
        $total = 0;
        foreach ($sizes as $size) {
            $total += $size;
        }
        echo 'Total: ', number_format($total / 1024, 2), 'KB', PHP_EOL;
    });
});

$loop->run();
Community examples

No community examples this week, there aren't any example out there using it in open source as far as I could find.

Examples

All the examples from this post can be found on Github.

Conclusion

The filesystem package is still in the works but already shown it's brute power with a simple and easy to use API. While developing it I've seen it hit 50MB/s on both a whole cluster of small files as well as the same speed on 1 big file read. These figures are all relative on a medium end machine I've initially bought to give talks with. At the same time I've seen it peak up to 100MB/s and hold there for a couple of seconds. Honestly EIO scares me at times but in small portions it is great to work with. The examples shown above show how easy it is to use but that simplicity makes it easy to get started as well go to far with it. So with great power comes great responsibility. I've been considering added call pools to it so you can't over do it with only a limited number of outstanding I/O operations at any given time.