twitter rss github stackoverflow
Joseph Silber's Avatar

Joseph Silber

Lazy Collections in Laravel

Laravel's LazyCollection class is a powerful tool that lets you process enormous amounts of data with very little memory. It's a recent addition to the framework (introduced in Laravel 6), and is not that well known yet.

To get people more familiar with the power of lazy collections, I gave a talk at the first Laravel Worldwide Meetup:

Doing More With Less: Lazy Collections in Laravel

A deep dive into Lazy Collections. We'll learn what they are, how they work under the hood, and how to use them to vastly decrease your app's memory footprint.

This post is the written counterpart of that talk. There's only so much you can cram into a 30 minute talk, so this post also has a ton of additional information that didn't make the cut for the talk.


We'll start by looking at regular collections and how they're not really suitable for huge amounts of data. Then we'll see how lazy collections can help us with that.

Regular Collections

The Collection class has been a staple of Laravel since time immemorial. As the documentation says, regular collections wrap a native PHP array, providing a fluent, convenient API to interact with the underlying array.

To create a regular collection, you pass it an array of values. To play around, let's start with a simple array of numbers:

use Illuminate\Support\Collection;

new Collection([1, 2, 3, 4, 5]);

In fact, Laravel collections have a handy times method, which is a convenient way to create a collection with a range of numbers:

Collection::times(100); // [1, 2, 3, ... 98, 99, 100]

Once we have a collection instance, we can start chaining methods onto it:

Collection::times(100)  // [1, 2, 3, ... 98, 99, 100]
    ->map(fn ($number) => $number * 2)  // [2, 4, 6, ... 196, 198, 200]
    ->filter(fn ($number) => $number % 20 == 0); // [20, 40, 60, ... 160, 180, 200]

While this simplified example isn't really useful in real life, it shows an important fact about regular collections: all values are held in memory, and every method call creates a new in-memory array of values (wrapped in a new Collection instance).

Running Out of Memory

Keeping all values in memory is okay when we have a relatively short list, but as the amount of data we're dealing with starts to grow, we'll quickly run out of memory.

To illustrate, let's try creating a collection class with a billion values:

Collection::times(1000 * 1000 * 1000);

If you try to run this on your computer, you'll most likely get an error that you've run out of memory:

Allowed memory size of 3154116608 bytes exhausted (tried to allocate 34359738376 bytes)

The reason for this is quite simple: the times method creates a collection that stores all its values in memory. Trying to allocate memory for a billion numbers will obviously exceed the amount of memory available.

Furthermore, even if we only want to work with a small subset of the collection (for example, to take the first 1,000 even numbers):

Collection::times(1000 * 1000 * 1000)
    ->filter(fn ($number) => $number % 2 == 0)
    ->take(1000);

...it'll still blow up, because each step builds an entire collection in memory. When we call the times method, it has no way of knowing that we're about to filter its values; that only happens in the next step.

Switching to Lazy Collections

If we try the above code using a lazy collection:

use Illuminate\Support\LazyCollection;

$collection = LazyCollection::times(1000 * 1000 * 1000)
    ->filter(fn ($number) => $number % 2 == 0)
    ->take(1000);

...we won't run out of memory, since none of those values have been generated yet (more on this shortly). In fact, this code snippet uses virtually no memory at all!

How's that possible? Through the power of Generators.

To fully understand lazy collections, we first need a solid understanding of PHP generators:

Generator Functions in PHP

The "generator function" is a powerful construct introduced in PHP 5.5. Trying to read the PHP documentation on it can be quite daunting, so let's break it down step by step, learning it in mini, bite-sized lessons.

Generator functions may return multiple values

Regular functions in PHP can only return a single value. After returning the first value, all subsequent return statements will be ignored.

function run() {
    return 1;
    return 2;
}

dump(run());

This will only dump 1. After the first return statement, the function is terminated, and no futher code in the function is executed.

To return multiple values from a function, we can use the yield keyword instead of return.

function run() {
    yield 1;
    yield 2;
}

dump(run());

But hold on! If you actually run this code, you'll probably be surprised by its output, which will be something like this:

Generator {
    executing: {...}
    closed: false
}

A Generator object? We never created this object, nor did we return that from our run function. So where is it coming from?

Lesson 1: This is what "generator functions" are. The mere presence of the yield keyword in a function instructs PHP that this is not an ordinary function, but a generator function. Generator functions are treated entirely different from regular functions.

We can see this by adding a dump to our generator function:

function run() {
    dump('Did we get here?');
    yield 1;
    yield 2;
}

dump(run());

This will not dump anything at all. Why?

Lesson 2: Calling a generator function does not even execute any of the code inside of the function body. Instead, we get a Generator object, which is a mechanism by which we can step through the function's code step by step, pausing at each yield statement.

Using current and next to get the generator function's values

To start stepping through the code in the generator function, use the Generator's current method. This will actually start executing the code until the first yield statement, and return the value that was yielded.

function run() {
    yield 1;
    yield 2;
}

$generator = run();

$firstValue = $generator->current();

dump($firstValue);

Running the code above will give us 1, which is the first value that is yielded from within the generator function.

Lesson 3: Use the Generator's current method to "power up" the generator function. The code in the function will execute till the first yield statement, and its value will be returned.

Unlike regular functions, the generator function is not terminated after the first value is returned. The function is still alive, waiting for us to get the next value from the Generator.

This time, instead of dumping the current value right away, let's first move the generator to the next yield statement, and only log that second value:

function run() {
    yield 1;
    yield 2;
}

$generator = run();

$firstValue = $generator->current();
$generator->next();
$secondValue = $generator->current();

dump($secondValue);

This will dump 2, which is the value returned by the second yield statement.

Lesson 4: Use the Generator's next method to advance the generator function to the next yield statement.

Using yield in a loop

Up until now, we've dealt with multiple hard-coded yield statements inside the generator function. But the true power of the yield statement can only be realized once we start using it in a loop:

function generate_numbers()
{
    $number = 1;

    while (true) {
        yield $number;

        $number++;
    }
}

$generator = generate_numbers();

dump($generator->current()); // Dumps: 1
$generator->next();
dump($generator->current()); // Dumps: 2
$generator->next();
dump($generator->current()); // Dumps: 3

Wait, what? An infinite loop???

Yep! Since the code in each iteration of the loop will not execute on its own, this loop is not actually infinite. It will only produce as many values as we pull out of it (by using a combination of current and next).

Lesson 5: Using yield from within an infinite loop does not actually cause an infinite loop. Since the generator function's execution pauses after every yield, it'll only run the loop as many iterations as the amount of values that are later requested.

Using foreach with Generators

Constantly calling current and next will get tiring really fast. So instead of having to do that manually, PHP supports passing a Generator directly to foreach!

Let's try dumping the first 20 numbers from our generator:

$generator = generate_numbers();

foreach ($generator as $number) {
    dump($number);

    if ($number == 20) break;
}

Notice the break there? Since we have an "infinite loop" in the generator function, we must take care to only pull a finite amount of values out of it. Otherwise this foreach will cause an actual infinite loop.

Lesson 6: Use foreach to easily enumerate all values in a generator. If there's an infinite loop in your generator function, take care to stop your foreach loop at some point.

Composing Generator Functions

Now, instead of having to add in a break manually, let's create a general helper function called take. We'll pass it 2 arguments: a generator, and a "limit":

function take($generator, $limit)
{
    foreach ($generator as $index => $value) {
        if ($index == $limit) break;

        yield $value;
    }
}

Note: The above take helper is intentionally very naive, to keep the sample simple. For a more robust version of it, read the source (you should be able to understand that at the end of this post).

Now that we have this take helper in place, we can compose the two generator functions together (composition is just a fancy word that means: using two functions together by passing the result of one function into the other function). We'll pass the numbers generator to the take generator function, and use that in our foreach loop:

$allNumbersGenerator = generate_numbers();
$twentyNumbersGenerator = take($allNumbersGenerator, 20);

foreach ($twentyNumbersGenerator as $number) {
    dump($number);
}

We can also do this directly inline, without the intermediate variables (shown above for clarity):

foreach (take(generate_numbers(), 20) as $number) {
    dump($number);
}

Lesson 7: You can create helper generator functions. They take a generator, and return a different generator based on the values in the first one. This lets you compose a whole chain of operations together, which is the actual secret sauce that drives Laravel's lazy collections!

Now that we have a solid understanding of generators and generator functions in native PHP, let's get back to Lazy Collections.

Lazy Collections Wrap a Generator Function

Contrary to regular (eager) collections, which wrap a native PHP array, lazy collections wrap a native PHP generator function.

Let's wrap a lazy collection around our numbers generator function:

use Illuminate\Support\LazyCollection;

$collection = LazyCollection::make(function () {
    $number = 1;

    while (true) {
        yield $number++;
    }
});

We now have a collection that "holds" a sequence of all numbers. I put "holds" in quotes because, as we've seen, these numbers haven't been generated yet.

Being a collection means we have access to the countless methods provided to us by Laravel. In fact, the eager Collection class and the LazyCollection class both implement the same Enumerable interface, providing the same set of APIs to both collection classes.

Using our lazy collection of all numbers, let's just take the first 10 of them, using the collection's take method:

$firstTenNumbers = $collection->take(10);

Ah... that's a much nicer API than our own function composition above!

Each method that we chain onto our lazy collection returns a new LazyCollection instance, which internally wraps a new generator function.

To bring this full circle, let's revisit our example from the beginning of this post (with a slight twist):

$collection = LazyCollection::times(INF)
    ->filter(fn ($number) => $number % 2 == 0)
    ->take(1000);

We start out with a collection that "holds" an infinite number of items, then filter them to only even numbers, and then take the first 1,000 values.

As we already know by now, not a single value has been generated yet. That's why this code uses almost no memory at all. These numbers will only be generated once we start enumerating them (using foreach, or the collection's each method):

LazyCollection::times(INF)
    ->filter(fn ($number) => $number % 2 == 0)
    ->take(1000)
    ->each(fn ($number) => dump($number));

Now, and only now, will those numbers be generated!

Exercise: How many numbers will be generated in total? Don't just keep reading. Think for a second before you proceed. In the end, how many numbers will be generated by the first generator in this snippet above?

The answer is two thousand. The code snippet above will generate a total of two thousand values. Why? Think about how the filter function works: it pulls values out of the original generator, discards any values that don't pass the filter, and then yields only the values that do pass the filter. So to get to 1,000 even numbers, it had to throw away 1,000 odd numbers!


At this point, you hopefully understand how lazy collections work under the hood (to learn more, take a deep dive into the source). Next up, let's get away from using simple numbers, and see how we can use lazy collections in a real-world scenario.

Streaming File Downloads With Lazy Collections

One of the most useful use-cases for lazy collections is streaming data exports to a download file. Streaming an export gives us many benefits:

  1. We don't have to keep all source records in memory.
  2. We don't have to build up the whole export file in memory.
  3. The user doesn't have to wait till the whole file is built up on the server. The download can start instantly!

Let's see how we can use a lazy collection to stream a CSV file to the browser, using the versatile league/csv package.

To start, let's create a lazy collection with a million dummy login logs:

$logins = LazyCollection::times(1000000, fn () => [
    'user_id' => 24,
    'name' => 'Houdini',
    'logged_in_at' => now()->toIsoString(),
]);

Now that we have this huge dataset (which hasn't actually been generated yet), let's see how we can stream it as a CSV file directly to the browser, without having to build it up as a huge CSV string in memory.

Streamed Downloads in Laravel

First off, since we're using Laravel, we need to figure out how streamed downloads work in Laravel. Laravel expects all routes to return a response (either a View, something that may be converted to JSON, or an actual Response object). Route handlers should never directly output anything to the client (by using echo et al).

For streamed downloads, use a StreamedResponse, which uses a callback to stream the response when the framework is ready for it. Within the callback, we can output stuff directly to the client:

Route::get('streamed-download', function () {
    return response()->streamDownload(function () {
        // From within here we can "echo"
        // or write to the PHP output stream
    }, 'the-filename.txt');
});

Writing to the PHP Output Buffer

Next up, let's see how to use league/csv's Writer class to stream a CSV file to the browser. The Writer expects a file or a stream, to which the CSV records will be written. We'll use the native php://output stream which, as the name suggests, lets us write to the output buffer directly. From the docs:

php://output is a write-only stream that allows you to write to the output buffer mechanism in the same way as print and echo.

Putting It All Together

Putting it all together, we end up with this:

use Illuminate\Support\LazyCollection;
use League\Csv\Writer;

Route::get('streamed-download', function () {
    $logins = LazyCollection::times(1000 * 1000, fn () => [
        'user_id' => 24,
        'name' => 'Houdini',
        'logged_in_at' => now()->toIsoString(),
    ]);

    return response()->streamDownload(function () use ($logins) {
        $csvWriter = Writer::createFromFileObject(
            new SplFileObject('php://output', 'w+')
        );

        $csvWriter->insertOne(['User ID', 'Name', 'Login Time']);

        $csvWriter->insertAll($logins);
    }, 'logins.csv');
});

To recap:

  1. We create a lazy collection with a million dummy records of login logs.
  2. The route returns a StreamedResponse, with a callback that the framework will call once it's ready for the stream.
  3. Inside the callback, we create a CSV Writer using the php://output buffer, so that we can write CSV records directly to the streaming download file.
  4. insertOne adds a single row in the CSV file, which acts as the header row of column names.
  5. The lazy collection is passed directly to insertAll, which internally uses foreach to spin through all records in the generator.

At no point do we have all of our records in memory; not the original raw data, nor the generated CSV. Each record is generated one by one, and imediately written to the streaming file in the user's browser!


We learned how to write data in a lazy manner. Now we'll learn how to read data lazily.

Reading Files Lazily With a Lazy Collection

Another area where lazy collections are extremely helpful is reading a file lazily. That is: reading a file line by line, processing each line as it comes in, never loading the whole file into memory.

NDJSON Log Files

A good example for reading files as a stream involves using the NDJSON format (also called newline-delimited JSON):

NDJSON is a convenient format for storing or streaming structured data that may be processed one record at a time. It's a great format for log files.

Here's how it would look to store recent logins as NDJSON:

{ "user_id": 2, "name": "Alice", "timestamp": "2020-07-29T22:51:30.352869Z" }
{ "user_id": 1, "name": "Jinfeng", "timestamp": "2020-07-29T22:54:05.280122Z" }
{ "user_id": 2, "name": "Alice", "timestamp": "2020-07-29T22:54:16.565840Z" }

Notice that this isn't an array, nor are there commas at the end of each line. Each line is a self-contained JSON object, which can be parsed and processed on its own.

Writing a File to Disk Lazily

To play around with this, we need an actual NDJSON file. Let's take a little detour, and see how to create such a file with a bunch of fake data, using a lazy collection:

LazyCollection::times(10 * 1000)
    ->flatMap(fn () => [
        ['user_id' => 1, 'name' => 'Jinfeng'],
        ['user_id' => 2, 'name' => 'Alice'],
    ])
    ->map(fn ($user, $index) => array_merge($user, [
        'timestamp' => now()->addSeconds($index)->toIsoString(),
    ]))
    ->map(fn ($entry) => json_encode($entry))
    ->each(fn ($json) => Storage::append('logins.ndjson', $json));

This will create a file at storage/app/logins.ndjson and add 200,000 fake login records, all without ever holding more than a single line in memory!

Reading a File Line by Line

Now that we have a log file, let's play around with it and see how we can run some statistics on this file, without ever loading the whole thing into memory.

To create a lazy collection that reads a single line of a file at a time, we can use the native PHP fopen and fgets functions, as shown in the 4th example of the original lazy collections PR:

$logins = LazyCollection::make(function () {
    $handle = fopen(storage_path('app/logins.ndjson'), 'r');

    while (($line = fgets($handle)) !== false) {
        yield $line;
    }
});

We now have a $logins lazy collection, where each item in the collection is a standalone JSON string. Let's see how many times Alice has logged in:

$loginCountForAlice = $logins
    ->map(fn ($json) => json_decode($json))
    ->filter() // In case we have empty lines
    ->where('name', 'Alice')
    ->count();

We now have the total count of how many times Alice has logged in (100,000 times), without ever keeping more than a single log entry in memory!

Reading & Streaming Files

For our last practical example, we'll combine reading and streaming a file download, using a single lazy collection.

Combining everything we've learned so far, we can parse the log file and stream it to the user as a CSV file download:

use Illuminate\Support\LazyCollection;
use League\Csv\Writer;

Route::get('read-and-stream', function () {
    $logins = LazyCollection::make(function () {
        $handle = fopen(storage_path('app/logins.ndjson'), 'r');

        while (($line = fgets($handle)) !== false) {
            yield $line;
        }
    })
    ->map(fn ($json) => json_decode($json, true))
    ->filter();

    return response()->streamDownload(function () use ($logins) {
        $csvWriter = Writer::createFromFileObject(
            new SplFileObject('php://output', 'w+')
        );

        $csvWriter->insertOne(['User ID', 'Name', 'Login Time']);

        $csvWriter->insertAll($logins);
    }, 'logins.csv');
});

Converting a Regular Collection Into a Lazy Collection

Before we go, let's quickly see how we can convert a regular collection into a lazy collection, and why we would even want to do that.

Let's say we're dealing with a huge dataset that can't be streamed, such as the results of a single API call:

use Illuminate\Support\Collection;

function get_all_customers_from_quickbooks() : Collection
{
    // Run some code that gets all QuickBooks customers,
    // and return it as a regular, eager collection...
}

The actual implementation of that function is not important. All we care about is that it returns a huge eager collection that can't be streamed to us, so we have to keep it all in memory.

However, lazy collections may still come in handy. To see how, let's count the amount of customers in France who have a balance over 100 euros:

$count = get_all_customers_from_quickbooks()
    ->where('country', 'FR')
    ->where('balance', '>', '100')
    ->count();

Simple, right? But let's look at what's happening under the hood: every time we call where, we're actually creating a brand new Collection, which means we're creating another array in memory to hold all filtered values.

We can do better.

Even though the original values are all held in memory, the subsequent filters shouldn't have to store their values in memory. We can use the lazy method on the regular Collection class to convert it to a LazyCollection:

$count = get_all_customers_from_quickbooks()
    ->lazy()
    ->where('country', 'FR')
    ->where('balance', '>', 100)
    ->count();

While the original values will still all be kept in memory, no additional memory will be allocated when we filter its results. Pretty neat.

Phew! That's a Wrap

That's gonna do it! This hopefully shed some light on how lazy collections work, and how you can use them to do more in your app with far less memory than you would otherwise need.

If you want to play around with the techniques we learned here, check out this GitHub repository, which has working examples of all code in this article. All of the relevant code is in the single routes/web.php file, each in its own route, ready for you to run in your browser and toy around with. Enjoy!


P.S. We didn't even get a chance to touch on the query builder's cursor method.

We'll leave that for another day.


Questions? Comments? Complaints? Ping me on Twitter.

Recent posts

  1. How to rid your database of PHP class names in Eloquent's Polymorphic tables

    Using Relation::morphMap() to instruct Eloquent to store more generic values for polymorphic types.

    Read more ➺

  2. Getting the current user (or other session data) in a Laravel controller constructor

    Session data is not available directly in controller constructors. Let's see how we can work around that.

    Read more ➺

  3. Gate and authorization improvements in Laravel 5.3

    With Laravel 5.2 we got a nice authorization system right out of the box. In 5.3 this got a lot of improvements and refinements. Let's take a look.

    Read more ➺

  4. The new Closure::fromCallable() in PHP 7.1

    Let's take a look at a neat little feature coming in PHP 7.1: easily converting any callable into a proper Closure using the new Closure::fromCallable() method.

    Read more ➺

  5. Improvements to authentication in Laravel 5.3

    Authentication has gotten some nice improvements in 5.3, so let's examine it piece by piece. We'll start with the current state of affairs, then take it from there.

    Read more ➺