Johno the Coder

PHP Developer & Solutions Architect

Category: Laravel Page 1 of 2

Handling large data with cron jobs and queue workers

One problem a lot of developers seem to come into, from reading their code, is when they try to handle large sets of data with background scripts. There are a few common mistakes that I have seen from inheriting the code of my predecessors, and I thought I would offer some solutions to the problems that can come about.

I will cover off these issues as problems and solutions, and I’m going to use Laravel solutions, but conceptually it doesn’t really matter what framework you’re working in.

Problem #1 – The script takes too long (or many hours) to run!

This is quite a common one, particularly when inheriting code from days gone by, when the business and database were small. Then things grew, and suddenly that script which took ten minutes to run, takes many hours.

When I have to tackle these kinds of issues, assuming I understand the script and what it is doing there are a couple of ways to solve the issue;

Firstly, if your script is doing many different things, split it out into multiple scripts, which run at their appropriate times, which do a single job each. However, this probably isn’t the ideal.

Usually a script that is doing some daily processing looks something like this (in Laravel, but it’s not really relevant), this is awful example we’re calculating some cumulative business values for the customer.

This is not an actual feature, or actual code, I’ve written it as a deliberately simple example. Honestly I’ve not even tried to run this code, it’s just an example

$customers = Customer::all();
foreach($customers as $customer){
    $orderTotal = Order::where('customer_id', '=', $customer->id)->sum('subtotal);
    $thisMonthOrders = Order::where('customer_id','=',$customer->id)->where('created_at','>=',now()->subMonths(1)->format('Y-m-d'))->sum('subtotal');
    $customer->total_spend = $orderTotal;
    $customer->total_spend_this_month = $thisMonthOrders;
    $customer->save();
}

In this example we’re getting the total value of the orders this customer has made in their lifetime and the last month, and then saving it against the customer record. Presumably for easy/fast search and filtering or something.

This is fine until you start hitting many thousands or hundreds of thousands of sales and/or customers, then the script is going to take a very long time to run.

Firstly, we can optimise the script, to only fetch the orders in the last month and therefore only calculate totals for the affected customers, that would have a huge impact. Buut that’s not the point that I am trying to make here. (Though it would be relevant).

It would be much more efficient in terms of execution speed to do something like the following, though it does assume you have a Queue Worker or Laravel Horizon running (or some kind of mechanism to handle jobs on a queue)


foreach(Customer::all() as $customer){
    CalculateTotals::dispatch($customer);
}

// app/Jobs/CalculateTotals

class CalculateTotals implements ShouldQueue{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
    protected $contact;
    public function __construct(Contact $contact)
    {
        $this->contact = $contact;
    }
    public function handle()
    {
        $this-contact->total_spend = Order::where('customer_id', '=', $customer->id)->sum('subtotal);
        $this-contact->total_spend_this_month = Order::where('customer_id','=',$customer->id)->where('created_at','>=',now()->subMonths(1)->format('Y-m-d'))->sum('subtotal');
        $this->contact->save();
    }
}

What we have actually done here, to speed up the execution time, is use the cron job to calculate what we should be handling/processing, but then passing off the work through to the job queues to handle. This means that you can have however many queue workers you have doing the processing and time consuming parts of the job. The cron will only be running for as long as it takes to find the relevant jobs to execute, and dispatch them via Redis or whatever queueing mechanism you’re using.

If you have 10,000 contacts to process, and it takes 2 seconds to process a contact, it would take 20,000 seconds to run the script.

If you run 20 queue workers, all all things are equal, the work from your script is now divided by 20 workers, or each worker would now have to deal with 500 jobs. 500 jobs multiplied by 2 seconds per job is 1,000 seconds to run (instead of 20,000).

NB: Personally, I would calculate these numbers on the fly, if they needed saving in the database I would calculate them on a listener to an event for when the orders were added, so they’re always live-ish, naturally I would not be using an ::all() as the basis to start any kind of processing in a large environment.

Problem #2 – The script uses too much memory, and crashes

This one depends on where the memory is coming from, but sticking with our above example…

$contacts = Contact::all();

Is a pretty bad place to start, it will work for a while, but if your database grows you’re eventually going to max out your memory trying to retrieve your whole data set.

$contacts = Contact::where('last_order_at', '>=', $carbonSomething->format('Y-m-d')->get();

This is a good place to start, whittle down how much you’re retrieving, but you might still be bringing back too much data.

Again, this is a demonstration, I’ve not run it, it’s just for the concept

$relevantContactsCount = Contact::where($qualifiers)->count();
$perPage = 10000;
$pages = ceil($relevantContactsCount / $perPage);
$currentPage = 1;
while($currentPage < $pages){
    $contacts = Contact::where($qualifiers)->orderBy('field','order')->limit($perPage)->offset(($currentPage - 1) * $perPage)->get();
    // Now we have 10,000 contacts only
    foreach($contacts as $contact){
        MyJobFromAbove::dispatch($contact); // Run it in a background job
    }
    $currentPage++;
}

Problem #3 – I can’t run my script as jobs, because it has to report back with totals, etc.

Now, this is an awkward one, but you simply need to change the way that you’re thinking about how you gather that information. You need a persistent way of storing the information, so that it can be summarised later.

There are a couple of ways to achieve this;

  1. When you start running the script, you create a record in the database or somewhere which you can report back to from your jobs (be wary of having every job run an incremental update for a “total contacts processed” type column!), or a way of consolidating the results
  2. Utilise caching mechanisms to remember certain information about what is running, so information can be shared between jobs and later on reported on
  3. Consolidate your results in somewhere very easy to calculate, and run your finalised report (which is perhaps emailed) once you know all processing is finished

Parting Thoughts

The normal rules of performance here still count…

  1. Only pull out from SQL what you actually need (select contact_id, subtotal, items_count from orders instead of select * from orders)
  2. Do what you can at SQL level, where possible SELECT SUM(subtotal) FROM orders WHERE contact_id = ?
  3. Ensure you have effective and efficient indices on the relevant columns
  4. With MySQL, for speed, use InnoDB tables, for saving storage space use MYISAM (very simplified!)
  5. Having issues with pulling information out of the database (rather than processing it?) consider NoSQL solutions like MongoDB
  6. Consider if there are better ways to get the data into your database, so that it can be retrieved more efficiently, or if there are other database design implementations which will suit your needs more closely
  7. Consider working in a near-live speed, instead of bulk processing, even if you hold that information off to be implemented the following day (for business logic purposes)

Dependency Inversion and Interface Segregation using Laravel’s Service Container

In very short, we’re going to cover the Interface Segregation Principle (ISP) and the Dependency Inversion Principle (DIP) in this article, and practically how we implement them within a Laravel environment.

Dependency Inversion Principle

  1. The principle is handy for the decoupling modules (making them independent of one another)
  2. Modules should not depend upon each other, but instead upon abstractions (in PHP there are generally interfaces)
  3. Abstractions should not be concerned with the implementation of functionality

Interface Segregation Principle

The next part we’re going to look at in this article is the Interface Segregation Principle, roughly covered in these two points

  1. No client should be forced to depend upon methods which it does not use
  2. Many specific interfaces are better than a single general purpose interface

Approaching the task

Now that we have recapped the principles, very briefly, here’s how we implement this.

  1. Split the required functionality into any parts which, according to the Single Responsibility Principle, would need to be separate – this will create interface segregation
  2. Understand the communication between any of these parts at its most generic level
  3. Create the interfaces (abstractions) through which this communication can be achieved
  4. Inject your dependencies based on the abstractions, achieving dependency inversion
  5. Resolve the abstractions to concretions with the Laravel service container

Implementing the Approach

The specification I was working to was (roughly);

Players should be able to send cash, bullets, and bonds to one another

Paraphrased specification

I’m going to quote some code below, which comes from this Pull Request

1. Split the requirements into single responsibilities

Several things happen when funds are transferred between accounts, so we need to logically split these

  1. Currency is credited to a character
  2. Currency is debited from a character
  3. A log of the transaction is stored
  4. A notification is sent

Logically, only the first 3 parts of this need to be programmed, because we can use Laravel Events, Listeners, and Notifications to handle the forth part.

Part 1 and 2 of this can be handled through a single class, which is responsible for the adjustment of currency amounts on a specific character.

So that leaves part 3, and the wrapping of parts 1 and 2 for our transferrer. With this separation, I now have 2 classes with single responsibilities:

  1. CurrencyHandler which is responsible for handling currency on characters (checking, crediting, debiting), single responsibility: this class will only change if the way in which we store/calculate currency is changed
  2. CurrencyTransferrer which is responsible for tying up the process of transferring currency from one character to another, single responsibility: this class will only change if the way which we carry out a transfer of currency changes

2. Understand the required communication

The transferrer depends upon the handler, because the transfer cannot happen without the handler. The next question is; what does the handler need to be able to communicate (whether answering questions, or receiving instructions)?

  1. Question: Is this (string) a valid currency? We have three in game
  2. Question: Is this amount valid for this character? i.e. can they afford to expend this amount
  3. Instruction: Add X amount of Y currency to this character
  4. Instruction: Remove X amount of Y currency from this character

And we also have to think about the transferrer, what do we need to be able to communicate with a currency transferrer?

  1. Instruction: Transfer X amount of Y currency from A character to B character

3. Create the Interfaces

From these single responsibilities, we can create 2 interfaces

  1. HandlesCurrency
  2. TransfersCurrency

HandlesCurrency defines the following methods to be implemented, according to the lines of communication we defined

public function creditCurrency(Character $character, string $currency, int $amount): void;

public function debitCurrency(Character $character, string $currency, int $amount): void;

public function characterHasOnHand(Character $character, string $currency, int $amount): bool;

public function validateCurrency(string $currency): bool;

Whereas, based on the required communication, the TransfersCurrency interface declares the following

public function transfer(Character $from, Character $to, string $currency, int $amount);

4. Segregate these interfaces from their concretions, and inject the abstractions

Laravel makes this bit exceptionally easy… (the following is within a Service Provider)

$this->app->bind(HandlesCurrency::class, CurrencyHandler::class);
$this->app->bind(TransfersCurrency::class, function () {
    return new CurrencyTransferrer(
        resolve(HandlesCurrency::class)
    );
});

So what we are doing here is…

  1. When I request the HandlesCurrency interface, Laravel will give me an instance of the CurrencyHandler class (concretion)
  2. When I request the TransfersCurrency interface, Laravel will give me an instance of CurrencyTransferrer
  3. The constructor method of CurrencyTransferrer requires an injection of its dependency of HandlesCurrency so what we do, is we tell Laravel to inject the appropriate resolution of HandlesCurrency (which is the CurrencyHandler) in for us
  4. Now, whenever we want to transfer currency between players we can simply request a CurrencyTransferrer

There are some really important things to note here;

  1. The CurrencyTransferrer does not know, nor care, how currency is handled throughout the game, it just knows that it needs to be able to ask some questions, and issue some instructions to the implementation
  2. Nowhere in the codebase (unit tests excluded) will the CurrencyHandler or the CurrencyTransferrer be mentioned, any implementation will only ever depend upon the abstractions thus, if I were to want to completely replace the CurrencyHandler, because it’s now handled by a microservice (for example), I would create a class that implements the interface, and change the service binding, every usage would be swapped

Just to wrap this full circle, let’s assume we have an API end point which transfers funds from one character to another, when we declare our Controller we would define the constructor something like as below (and Laravel would do the rest for us)

class CurrencyTransferController extends Controller
{
    protected $transferrer;
    public function __construct(TransfersCurrency $transferrer)
    {
        $this->transferrer = $transferrer;
    }
}

In Summary

To summarise here, what we’ve done is taken a piece of functionality which could’ve gotten very messy. We have split it into its reusable components, we have then inverted any dependency on those concretions, so we only depend upon the channels of communication which they have.

After doing all that we have completely interchangeable classes, we’ve used the Laravel service container, to understand how to resolve (and inject a dependency into a concretion of) our abstractions.

This is a very simple example, but what it does is demonstrate how to create highly decoupled code, following the I and the D from the SOLID principles.

Some disclaimers

  1. The Character model is injected directly, and is not inverted via an interface. I am porting old functionality across, and nothing throughout the system can function with the concept, however I think I probably will go and implement an interface to the effect of HasCurrency rather than accessing the fields directly. The other thing is, if I’m not careful I create lasagne here, I may, later on, decide to implement the HandlesCurrency interface directly upon the Character model, but in doing that I will be breaking the Single Responsibility Principle – I’m still toying with the different ways I could do this
  2. Strictly speaking the CurrencyHandler could be split into CreditsCurrency and DebitsCurrency – the reason I haven’t done this is because they are direct opposite functionalities. But now that I have noticed this, I may well go and change it
  3. I’m not perfect – there will be mistakes, I’m okay with that, but feel free to flag them, I am always happy to take on board other perspectives and learnings

Laravel Deep Dive – Mafia Online – Contents
I will update this as I add new articles

  1. Introduction
  2. Dependency Inversion and Interface Segregation using Laravel’s Service Container
  3. Achieving Single Responsibility with HTTP Requests in Laravel (coming soon)

Introduction to the Deep Dive

Hi everyone

I’ve been meaning to do deep dive articles for a while, and have always struggled to find the right code samples and stuff.

This weekend I made the decision to take a project I’ve been working on (Mafia Online) in a different direction, and port the codebase (which is shocking, for the reasons I talk about here) into a fresh Laravel installation.

Now, the scary part of that, is as follows; lots of people talk the talk frequently, but you can never really see what they’ve done, the things they’re capable of, and how they go about practicing what they preach.

Mafia Online was never supposed to go live. It only went live because a few of my friends liked it and I thought “well, why not?”, but because it was never supposed to go into production, it was built, well, shoddily.

This brings me to the next point of these deep dive articles. I want to cover, in depth, programming principles, software design patterns, and various other things. I need a real, working codebase to do this.

I also wanted to hold myself to account to not take shortcuts, so, I made the new Mafia Online codebase open source. No license attached, if you want to contribute to the project, awesome. It’s not a commercial project, so if you want to fork off and do your own stuff, great. Want to create your own clone of the game? Go for it.

You can find the repository on GitHub, and every now and then I am going to post real world code samples and examples of how to implement the things that you’ve heard of.

I realise I am likely to open myself to a world of pain here, but that is what kind of the point. I want to empower developers new and old, to learn new things, to make themselves vulnerable to criticism. This gives you the ability to learn new things through critique, but also empowers you to own your decisions. In front of the whole internet.

Laravel Deep Dive – Mafia Online – Contents
I will update this as I add new articles

  1. Introduction
  2. Dependency Inversion and Interface Segregation using Laravel’s Service Container
  3. Achieving Single Responsibility with HTTP Requests in Laravel (coming soon)

Simple open source package for Laravel RESTful APIs

Hi everyone

In my last post about creating simple search and filtering for Laravel I said I was considering creating a package to encapsulate this functionality.

Well, I got this done a few weeks ago and have now released it onto Composer (Packagist) as kya/laravel-rest

You can find the repository on GitHub here, and the documentation is in the Wiki.

There is space on there for issues and things, of course. So if you have any feature requests do let me know, the current release is v0.1.

I know it’s going to be useful for me, as I’ve used it a couple of times already; I hope it proves to be of some use for you, too.

Speak to you all soon, JTC

Model Search and Filtering in Laravel

Hi guys, as I have written this functionality a few times at this point, and I have been asked how to do it a couple of times, too. That is Laravel searching and filtering, I am going to develop a simple composer package which I can use to allow users to search and filter Laravel (Eloquent) Models using PHP traits which can be used on Laravel controllers.

Assumptions

I am going to work on the assumption that you are comfortable with basics.

I shall also assume you’re fairly comfortable with the Illuminate HTTP Request object, a Controller, and Eloquent Model.

Concept

So the first thing we want to tackle, as a concept, is that the best way, in my experience, is to start with a query, and then modify it as required. This allows us to optionally do almost anything, but equally allows us to do nothing, and simply return an unfiltered set if we want to.

This point would be true outside of Eloquent too, even if working with vanilla SQL.

Starting Off

To start with you want something like this within your Controller.

public function index(Request $request)
{
// Of course you can change this if you don't want to select everything
$query = YourModel::select('*');
}

Now we have the $query object which we can work with as and where we want to.

Filtering Records

Excluding deleted/is deleted, as we’ll cover that later on.

Let’s say you now want to filter results on author_id just to keep things simple.

if(!empty($request->author)){
$query->where('author_id', '=', $request->author);
}

Of course take into account everything here, you may need to do sanitisation or validation or anything else in there.

Multiple Filters, Save Lines

If you have many fields on which you wish to do this I usually find something like this is better

$fields = ['author_id', 'category_id'];
foreach($fields as $field){
if(!empty($request->$field)){
$query->where($field, '=', $request->$field);
}
}

That example assumes the exposure of your field names, you could always use an associative array so you could do “author” maps to “author_id” and so on, if you were so inclined.

Fluffy Search

Now you might want to do a fluffy search, taking the assumption of posts with content and an author name, and a category name (stored within the model) you could do something like this

if(!empty($request->search)){
$searchFields = ['title','content','author_name','category_name'];
$query->where(function($query) use($request, $searchFields){
$searchWildcard = '%' . $request->search . '%';
foreach($searchFields as $field){
$query->orWhere($field, 'LIKE', $searchWildcard);
}
})
}

Handling Trash

You probably want something which allows you to see soft deleted models, if you’re using them. I do something like this

if($request->trash == true){
// Retrieve soft deleted models only, you can only do this if you're using soft deletion on this model
$query->onlyTrashed();
}

Ordering and Limiting

Now to handle ordering and pagination – which Eloquent beautifully does for us, all we need to do is let the user modify it if they want to.

// Set the query ordering
$query->order($request->orderBy ?? 'updated_at', $request->order ?? 'DESC');

$perPage = $request->per_page ?? 25;

// It is a good idea to make sure we cap this
if($perPage > $maximumPerPage){
$perPage = $maximumPerPage;
}

// Final part - get the results
$results = $query->paginate($perPage);

// You will now need to do something with your results, and return some kind of Response - this could be a JSON response or adding the results to the data returned to the view, depending on your context

There’s an assumption here that our default order is updated at, last updated first, and that if not specified, we want to return 25 pages.

Of course you can store this information wherever you want.

In Summary, and Further Reading

Rather than creating a really complicated set of rules to handle all the things you may need from your API calls, start with a query, and then optionally modify that query.

Further Reading

Page 1 of 2

Powered by WordPress & Theme by Anders Norén