Using MySQL to return complex tagged results

Hi everyone

In this article I thought I would tackle complex tag searching, using MySQL. There are lots of ways this can be achieved at a PHP level, through looping and checking/comparing.

However, if you are working with a large dataset, and tagging is a required piece of functionality, for example in an enterprise blog, or a CRM, then you’ll need to get your results direct from the database, rather than loading into memory and looping, which can work in smaller applications.

We are going to cover 3 types of tag searching, and those types of search working with and correlating all 3 of those results together.

  • Record must have all specified tags (e.g. this contact must be both a customer and high value)
  • Record must have any specified tags (e.g. this post must be tagged with laravel or mvc)
  • Record must not have any specified tags (e.g. don’t show anybody from a CRM which has already received the “A” campaign, so that “A” customers don’t receive “B” campaign, too)

Outlining the example

To keep things simple, the example we’re going to work with is as follows;

  1. We have a customers table
  2. Every customer record has one, many, or no corresponding tags in the customer_tags table
  3. The customer_tags table has customer_id and tag_id bridging the customers and tags tables, respectively

The query we are working to needs to do return records which conform to the following tagged search

  1. Must have tags 999 and 888
  2. Must also have any of the tags 777, 666, 555
  3. Must not have any of the tags 444 or 333

Finding records which have all tags

The first thing we need, is a successful WHERE statement which will do this, of course, you are then going to want it to be programmatically generated.

In the example below we’re doing is joining on the mandatory tags, this is arguably the simplest part of the selection, because we’re doing an inner join – if the tag doesn’t exist, the record won’t be available, which means we don’t need to do any WHERE filtering etc.

Notice that I am naming the contact_tags joins as mh1 and mh2 (mh being short for must have)

/* Retrieve the fields we want */
SELECT id, first_name, last_name

/* Start with the customers table */
FROM customers c

/* For each MUST HAVE tag */
JOIN customer_tags mh1
ON c.id = mh1.customer_id
AND mh1.tag_id = 999

JOIN customer_tags mh2
ON c.id = mh2.customer_id
AND mh2.tag_id = 888

Finding records which also have any of the any-specified tags

Now we have 999 and 888 records only, we now need to make sure that we have one more more of the any tags (777, 666, and 555)

What we need here is to look for the connections; we need to do these as an outer join, as it is perfectly valid for the corresponding record to not be found.

Following that, we need to wrap a WHERE statement which ensures that at least one of those records were returned.

For the must-have tags we did an inner join, this means that unless the corresponding record is found, the result will be returned.

In this case we’re going to do an OUTER JOIN on the LEFT. The join being an OUTER JOIN means that the (contact) record will be returned, even if the corresponding contact_tag record was not found. Being a LEFT JOIN means that we’re going to base from our contacts table, we’re not going to return irrelevant contact_tags records.

/* Retrieve the fields we want */
SELECT id, first_name, last_name

/* Start with the customers table */
FROM customers c

/* For each MUST HAVE tag */
JOIN customer_tags mh1
ON c.id = mh1.customer_id
AND mh1.tag_id = 999
JOIN customer_tags mh2
ON c.id = mh2.customer_id
AND mh2.tag_id = 888

/* Join on the have-any tag relationships */
LEFT OUTER JOIN customer_tags ha1
ON c.id = ha1.customer_id
AND ha1.tag_id = 777
LEFT OUTER JOIN customer_tags ha2
ON c.id = ha2.customer_id
AND ha2.tag_id = 666
LEFT OUTER JOIN customer_tags ha3
ON c.id = ha3.customer_id
AND ha3.tag_id = 555

Now this in itself isn’t going to actually solve the issue we have around making sure we have 777, 666, or 555 – it is simply going to join on those relationships, if they exist.

This is where we need a WHERE statement to filter the results for us.

AND (
ha1.id IS NOT NULL
OR ha2.id IS NOT NULL
OR ha3.id IS NOT NULL
)

This statement means that if any of the have any (ha) tag relationships were found (the brackets thus returning TRUE) will be valid and returned

Does not have any of these tags

So the final part in this requirement, is to ensure that the records returned do not have any of the specified tags that we have to avoid.

We are going to do this in a very similar way to how we did the have any tags, see below

LEFT OUTER JOIN customer_tags nh1
ON c.id = nh1.customer_id
AND nh1.tag_id = 444

LEFT OUTER JOIN customer_tags nh2
ON c.id = nh2.customer_id
AND nh2.tag_id = 333

And, to make sure they were not returned, we’re going to do the following

AND nh1.id IS NULL
AND nh2.id IS NULL

So we’re LEFT OUTER JOINing on here for the same reason we left outer joined on the have any tags – it is valid for this tag to not exist, in fact, in this case we actively don’t want it to exist.

Putting the query together

/* Retrieve the fields we want */
SELECT id, first_name, last_name

/* Start with the customers table */
FROM customers c

/* For each MUST HAVE tag */
JOIN customer_tags mh1
ON c.id = mh1.customer_id
AND mh1.tag_id = 999
JOIN customer_tags mh2
ON c.id = mh2.customer_id
AND mh2.tag_id = 888

/* Join on the have-any tag relationships */
LEFT OUTER JOIN customer_tags ha1
ON c.id = ha1.customer_id
AND ha1.tag_id = 777
LEFT OUTER JOIN customer_tags ha2
ON c.id = ha2.customer_id
AND ha2.tag_id = 666
LEFT OUTER JOIN customer_tags ha3
ON c.id = ha3.customer_id
AND ha3.tag_id = 555

/* Join on the must not have tag relationships */
LEFT OUTER JOIN customer_tags nh1
ON c.id = nh1.customer_id
AND nh1.tag_id = 444
LEFT OUTER JOIN customer_tags nh2
ON c.id = nh2.customer_id
AND nh2.tag_id = 333

/* Ensure at least one of the have-any relationships is found */
WHERE (
ha1.id IS NOT NULL
OR ha2.id IS NOT NULL
OR ha3.id IS NOT NULL
)

/* Ensure not have relationships don't exist */
AND nh1.id IS NULL
AND nh2.id IS NULL

/* Never return duplicate customer */
GROUP BY c.id

/* Order by last updated customer records */
ORDER BY c.updated_at DESC

Adding further search filtering

I am not actually going to fully cover this, that will be on you with your application to make the best decisions, if you want to build a SQL query (not a prepared statement) then you may find this line handy, I can’t vouch for its safety.

$sanitised = DB::connection()->getPdo()->quote($string);

Dynamically Generating Your Query

All you really need to do is variablise (yes, I made that word up) your query generation;

I’m sorry, this isn’t PSR-2, I’m not going to lie to you, I wrote it straight into WordPress, and don’t want to spin up containers to run PHPCBF/PHPCS against it

// These are the must have tags
$mustHaveAllTags = [999,888,777];
// Must have 1+ of these tags
$mustHaveAnyTags = [666,555];
// Must not have any of these tags
$mustNotHaveTags = [444,333];
// Pagination of the query
$page = 1;
$perPage = 100;
// These are the components of our WHERE statement
$whereStatements = [];
// This is where we will hold our SQL query
$sql = '';
// Loop the must have tags
foreach($mustHaveAll Tags as $tagId){
$joinName = "mh$tagId";
$sql .= "
JOIN customer_tags $joinName
ON $joinName.customer_id = c.id
AND $joinName.tag_id = $tagId";
$wheres[] = "$joinName.id IS NOT NULL";
}
// Handle the must have any tags
if(!empty($mustHaveAnyTags)){
$anyWheres = [];
foreach($haveAny as $tagId){
$joinName = "ha$tagId";
$sql .= "
LEFT OUTER JOIN customer_tags $joinName
ON $joinName.customer_id = c.id
AND $joinName.tag_id = $tagId";
$anyWheres[] = "$joinName.id IS NOT NULL";
}
$wheres[] = "(" . implode(" OR ", $anyWheres) . ")";
}
// Not have any tags
foreach($notHave as $tagId){
$joinName = "nh$tagId";
$sql .= "
LEFT OUTER JOIN customer_tags
$joinName ON $joinName.customer_id = c.id
AND $joinName.tag_id = $tagId";
$wheres[] = "$joinName.id IS NULL";
}
// Append all the WHERE statements we created
$sql .= " WHERE " . implode(" AND ", $wheres);
// Group by the contact
$sql .= " GROUP BY c.id ";
// Calculate pagination
$offset = ($page-1) * $perPage;
$limit = $perPage;
// Append query limiting
$sql .= " LIMIT $offset, $limit ";

Next Steps

The next steps on this, to be honest, are beyond the scope of this article. What you now have is a query which will provide your tagged results as appropriate.

For me, I would now extract that functionality into a class, probably 2 (one responsible for providing a search “interface”, and one for generating SQL), and depending on how you want to use Eloquent or raw SQL either extract the custom searching (if you wanted to search by name, for example) into Eloquent, using prepared statements, or escape strings and queries (I would personally discourage the latter approach).

If you do want to do it in Eloquent you will incur a second query, but there are (from a programmatic perspective) some advantages (like not worrying about generating and executing prepared statement), if this is the case only select c.id from your query. From where run whatever normal Eloquent stuff you want, and simply add

$query->whereIn('id', $idsReturnedFromYourQuery);

What about performance? Joins are not performant

I’ve tested this kind of query with up to about 10 joins (10 tags), on a dataset of about 500,000 base records and it ran in about 300ms on a $40 Digital Ocean droplet.

Make sure that customer_tags has a compound index set on customer_id + tag_id to make the lookup faster, and foreign key both fields in that table.

In a lot of applications, this will be performant enough. When you start hitting the millions of records I am sure it will slow down and you will need a different approach.

Model Search and Filtering in Laravel

Hi guys, as I have written this functionality a few times at this point, and I have been asked how to do it a couple of times, too. That is Laravel searching and filtering, I am going to develop a simple composer package which I can use to allow users to search and filter Laravel (Eloquent) Models using PHP traits which can be used on Laravel controllers.

Assumptions

I am going to work on the assumption that you are comfortable with basics.

I shall also assume you’re fairly comfortable with the Illuminate HTTP Request object, a Controller, and Eloquent Model.

Concept

So the first thing we want to tackle, as a concept, is that the best way, in my experience, is to start with a query, and then modify it as required. This allows us to optionally do almost anything, but equally allows us to do nothing, and simply return an unfiltered set if we want to.

This point would be true outside of Eloquent too, even if working with vanilla SQL.

Starting Off

To start with you want something like this within your Controller.

public function index(Request $request)
{
// Of course you can change this if you don't want to select everything
$query = YourModel::select('*');
}

Now we have the $query object which we can work with as and where we want to.

Filtering Records

Excluding deleted/is deleted, as we’ll cover that later on.

Let’s say you now want to filter results on author_id just to keep things simple.

if(!empty($request->author)){
$query->where('author_id', '=', $request->author);
}

Of course take into account everything here, you may need to do sanitisation or validation or anything else in there.

Multiple Filters, Save Lines

If you have many fields on which you wish to do this I usually find something like this is better

$fields = ['author_id', 'category_id'];
foreach($fields as $field){
if(!empty($request->$field)){
$query->where($field, '=', $request->$field);
}
}

That example assumes the exposure of your field names, you could always use an associative array so you could do “author” maps to “author_id” and so on, if you were so inclined.

Fluffy Search

Now you might want to do a fluffy search, taking the assumption of posts with content and an author name, and a category name (stored within the model) you could do something like this

if(!empty($request->search)){
$searchFields = ['title','content','author_name','category_name'];
$query->where(function($query) use($request, $searchFields){
$searchWildcard = '%' . $request->search . '%';
foreach($searchFields as $field){
$query->orWhere($field, 'LIKE', $searchWildcard);
}
})
}

Handling Trash

You probably want something which allows you to see soft deleted models, if you’re using them. I do something like this

if($request->trash == true){
// Retrieve soft deleted models only, you can only do this if you're using soft deletion on this model
$query->onlyTrashed();
}

Ordering and Limiting

Now to handle ordering and pagination – which Eloquent beautifully does for us, all we need to do is let the user modify it if they want to.

// Set the query ordering
$query->order($request->orderBy ?? 'updated_at', $request->order ?? 'DESC');

$perPage = $request->per_page ?? 25;

// It is a good idea to make sure we cap this
if($perPage > $maximumPerPage){
$perPage = $maximumPerPage;
}

// Final part - get the results
$results = $query->paginate($perPage);

// You will now need to do something with your results, and return some kind of Response - this could be a JSON response or adding the results to the data returned to the view, depending on your context

There’s an assumption here that our default order is updated at, last updated first, and that if not specified, we want to return 25 pages.

Of course you can store this information wherever you want.

In Summary, and Further Reading

Rather than creating a really complicated set of rules to handle all the things you may need from your API calls, start with a query, and then optionally modify that query.

Further Reading

An introduction to Checksums

Hi everyone

This is a quick introduction into checksums, and practically how to use them, at the request of someone through the Ask Johno section of the blog.

I know what a checksum is, but I’m not sure how to implement or use one (in PHP) to achieve what I need.

Anonymous Asker

They go on to explain what they are trying to achieve, which is essentially to verify a file hasn’t changed (in this case it’s the HTTP document sent by an API) before doing something.

Firstly, a checksum is a very small data snippet (datum) which represents something larger.

In my experience there are 2 main reasons to use one, these are;

  1. To verify a change in something larger
  2. To verify that a piece of data matches from that which was sent by its originator (checksum on an API payload)

I will very briefly cover both of these with PHP examples.

Note; I am using MD5 for simplicity, but depending on your requirements this likely won’t be the best option for your use case.

Edit: Please see Further Reading at the bottom of this article for more information about hashing

Using a Checksum to Detect Changes

Now I don’t know what we’ve got, but whatever it is; we are going to need a string representation, 2 main ways of getting this:

$stringRepresentation = serialize($myThingToCheck);

or

$stringRepresentation = json_encode($myThingToCheck);

Personally, I would advise PHP’s serialize because that will work with, and instantiate objects. However, the choice is yours depending on your use case, JSON is smaller than a PHP serialization.

Now that we have a string, we need to make it small and easy to check. Something like this will work fine;

$checksum = md5($stringRepresentation);

Now to check for changes, we just need the last checksum that we stored that something happened on.

if($checksum != $oldChecksum){
echo 'Something has changed';
}

So that covers how to use a checksum to detect changes in pieces of data, which is useful if you’re having to poll for changes.

Using a Checksum to Verify Validity

This is something I’ve noticed a couple of times, particularly when working in the financial industry, and around certain payment gateways.

It usually looks something like this; but it does change per API integration so be aware of that and follow their own documentation.

$secret = 'my_secret_api_key';

$payload = [
'foo' => 'Bar',
'another' => 'Thing',
'datetime' => '2018-12-25 00:00:00'
];

$jsonPayload = json_encode($payload);

$checksum = md5($jsonPayload . $secret);

$payload['checksum'] = $checksum;

// Do the rest of your stuff here, including sending the payload etc.

One advantage of this approach is that you never expose the API key in plain text.

If you were to want to verify on the API so that you’re the provider, rather than integration, you would simply do these steps in the opposite order, so it would look something like this

// Assuming you've done everything you need, and now have the $payload array/object back

$secret = 'the_secret_youre_expecting';

$checksumProvided = $payload['checksum'];
unset($payload['checksum'];
$checksum = md5(json_encode($payload) . $secret);

if($checksum != $checksumProvided){
// If the checksums don't match, in theory the secret key provided was incorrect
}

I think that about covers this topic, as a brief introduction to checksums, and how they’re often used in PHP.

Further Reading

PHP: hash() for more information about the best ways to hash data

I have deliberately not gotten into the discussion over hashing algorithms, as it really is out of the scope of this article, and indeed a whole book could be written on the topic alone.

Thanks to u/artemix-org and u/BradChesney79 on Reddit for suggesting this edit.

Amazon Web Services – EC2 Quickstart

Good afternoon everyone

Today I am going to give a super-quick recipe to get your first EC2 up and running; if you’ve worked with virtual or dedicated servers before the cloud architecture thing can feel a bit overwhelming, or if you’ve not had to manage your own environments before. In any case, as you don’t get to install your own OS it can feel disorientating as you’r

This guide does not encourage best practices, it’s simply enough to get you running.

I am assuming you already have a rough understanding of what an EC2 is, and have signed up for your AWS account.

  1. Go to the AWS EC2 dashboard
  2. Click “Launch Instance”
  3. Click the “Select” button for the top row (Amazon Linux 2 AMI (HVM), SSD Volume Type)
  4. You now have some options to configure your instance, for the purposes of this article, I’m simply selecting t2.micro which is a free-tier general purpose EC2 – when you are setting up production environments, make sure you actually read through these options and select the appropriate decisions
  5. Click “Configure Instance Details” in the bottom right corner
  6. There are a whole bunch of options here which, of course, are really important for production EC2, however explaining these options are outside of the scope of this quick-start guide
  7. Click “Add Storage Details” in the bottom right corner
  8. Here you can configure the details of the storage you want your EC2 to have, for the purposes of this guide I’m going to keep the default 8gb (as I’m not going to need anything more than that)
  9. Click “Add Tags”, here you can add some tags to your instance for management and administrative purposes, again, this is out of scope for this tutorial
  10. Click “Configure Security Group”, from here you can configure the security policies around your EC2
  11. You will see a single rule configured for port 22 (SSH) connections, which allows all inbound traffic. I would advise changing the source. You can either do “My IP” which will detect and utilise your current IP, or you may wish to add further rules for multiple IPs/nets
  12. I am making the assumption you want to allow HTTP traffic to connect to your EC2 – click “Add Rule” in the bottom left, and select HTTP (this will allow inbound connections on port 80), if you want to allow HTTPS traffic (let’s be honest, all traffic should be HTTPS, it’s 2018) then you’ll need to add that rule, too, as it will allow traffic on port 443
  13. Continue adding rules as appropriate to allow connections to your instance
  14. Now you can “Review and Launch” – confirm your details and hit “launch”
  15. When you hit launch you will be prompted to either create a new key/value pair, or to utilise an existing one. Select new (which you will need to do unless you’ve set up one with AWS previously)
  16. Wait for your instance to be launched

Okay, so you now have an instance, and you’re going to want to do some stuff with it, presumably. All we’re going to do is shell into the server, and install Apache; then we’re going to point a DNS record so that web traffic hits that EC2.

  1. Check the box next to your corresponding EC2 and you’ll get some details in the bottom panel
  2. Click connect in the top bar and you’ll see some details, something like
    ssh -i "the-name-of-your.pem" ec2-user@ec2-xx-xx-xxx-xxx.eu-west-2.compute.amazonaws.com

    Making sure that the .pem file is pointing to the location you’ve stored your file from point 15 above. Now you will be shelled into your server.

  3. You will be prompted to fire a yum update
    sudo yum update
  4. I find the following steps get annoying unless I su to root
    sudo su root
  5. Install your web server
    yum install httpd
  6. Throw in a very simple virtual host declaration, just to accept web traffic (as I say – this is not best practice, it’s just to get you with a web-facing EC2!)
    nano /etc/httpd/conf/httpd.conf
  7. At the bottom of this file add something that looks roughly like this
    <VirtualHost *:80>
        ServerName your.domain.or.the.ec2.provided
        DocumentRoot /var/www/html
    </VirtualHost>
  8. Now you have a vhost to accept some web traffic, come out of nano and start httpd
    service httpd start
  9. Make sure the appropriate DNS records are set to point to either the IP address or CNAME set to the AWS subdomain (or you’re host file hacked)
  10. Visit the domain name you set in point 7 above

Voila! Very simple, not production ready, but you do now have an EC2 running and accepting web traffic, on the domain/subdomain of your choosing.

Until next time
JTC out

Bringing OOP functionality into WordPress

So if, like me, you’re an object orientated programmer and are suddenly asked to build functionality into WordPress it can feel like you’re trapped into procedural code and functions. Consequently; I thought I would whip up a little tutorial of how to work within WordPress, whilst maintaining your OOP integrity. The reason this is important is quite simple, if you’re building complex functionality into WordPress, you want to be able to use OOP principles and methodologies.

Don’t worry or get confused, themes vs plugins

Essentially, these are the same thing! They use the same library of code, the APIs you can use within WordPress are the same. You could be asked to implement the code in either, and you’ll do it the same way. The real difference is whether you want the functionality to be exposed if the website changes theme, if that’s the case you’ll move the code into a WordPress.

PSR-4 Autoloading

The first thing you’ll want to do is get some PSR-4 autoloading running so that you can use your classes without having a huge set of require, require_once or include statements which can be a nightmare to maintain. If you’re working in a theme put this code into your functions.php – if you’re in a plugin then I would advise you do this in the registration file which might be wp-contents/plugins/yourplugin.php or it might be wp-contents/plugins/yourplugin/functions.php. If in doubt, look for the declaration of the plugin – which will be a PHP code comment as seen here.

I will use JohnoTheCoder as my vendor name and PluginTutorial as the project namespace. The file we’re putting this code into is called whenever WordPress is loaded (because the theme is initialised, as are the plugins) – so you’ll want something like this

/*
    Declare the autoloading - you can repeat this for all your different namespaces if you wish, for example if you're bringing in code from other repositories
*/
spl_autoload_register(function ($class) {

    // The classes I want to autoload are within the JohnoTheCoder\PluginTutorial namespace
    $prefix = 'JohnoTheCoder\\PluginTutorial\\';

    // The class being loaded is not within my namesapce, return early
    if (strncmp($prefix, $class, strlen($prefix)) !== 0) {
        return;
    }

    // All we're doing here is saying that your namespace classes are found within the src subdirectory
    $file = __DIR__ . '/src/' . str_replace('\\', '/', str_replace($prefix, "", $class) . '.php';

    // If the file does not exist - throw an exception with a helpful message
    if (!file_exists($file)) {
        Throw new Exception("Class [$class] not found in expected location [$file]");
    }

    // The file does exist, so require the file
    require_once($file);

});

What this means now is that you can reference JohnoTheCoder\PluginTutorial classes, and they will be looked for within your plugin, for example JohnoTheCoder\PluginTutorial\Subspace\MyClass would be found in plugin-directory/src/Subspace/MyClass.php – of course you can modify this to follow the naming conventions you have in place.

That’s it for this short tutorial – I’ll do some more “OOP in WP” tutorials in the future, to enhance maintainability and help encourage true OOP development within WordPress plugins and themes.