Building the associative array – ideas
When we talk about arrays we usually mean the vector of something – primitives, objects or even arrays. But there are a lot of situation when we need to carry extra information with our data. We can use nested arrays but it doesn’t cause that the specific item within the array will be easy to identify. To achieve it we should use custom keys. They provide easy access to any element of the array as long as we know the key corresponding with that value.
There are some reasons why we can use associative arrays instead of simple vectors, e.g. readability. Named keys are also more meaningful than numeric indexes. Instead of thinking about the whys, image the situation when we could use a hashmap. Let’s focus on how we can build the associative array from a vector.
Example scenario
In our application, we have users from different parts of the world. During the registration process, the user can select its own country. In our database, we store information about the country in the ISO-3361-alpha-1 format. The user should see the full country name instead.
We actually have an array of Country
objects, which definition is presented below:
class Country
{
private $name;
private $code;
public function __construct(string $name, string $code)
{
$this->name = $name;
$this->code = $code;
}
public function getName(): string
{
return $this->name;
}
public function getCode(): string
{
return $this->code;
}
}
One of our frontend components can render Combo
(knew as the <select>
node) and it expects associative array as an input in format ['value' => 'text visible for user']
.
We need to create this array.
Approach #0: the basic loop
function prepareCountriesForCombo(array $countries): array
{
$output = [];
foreach ($countries as $country) {
$output[$country->getCode()] = $country->getName();
}
return $output;
}
$countriesSelect = prepareCountriesForCombo($countries);
The most basic approach I know. I often question solutions which use the raw loops because they are low-level being. But if they are encapsulated within the well-described function, that’s O.K.
Creating a function designed to produce specific data for other component seems to be a nice idea. Let’s consider that we need to deliver data to different Combo
. It would be nice to use the same function but with different data. I need to generalize this approach.
Approach #1: Generic way to produce associative array
Besides the array containing the data, we need to pass two extra functions. The former will be responsible for transforming the element to the text visible for a user and the latter will transform the element to the key of the associative map.
The function could look like this:
function mapForCombo(array $array, Closure $fnValue, Closure $fnKey): array
{
$output = [];
foreach ($array as $element) {
$output[$fnKey($element)] = $fnValue($element);
}
return $output;
}
$countriesSelect = mapForCombo(
$countries,
function (Country $country) { return $country->getCode(); },
function (Country $country) { return $country->getName(); }
);
It looks pretty good and probably we should stop discussing more this boring feature. After all, let’s try to find other implementations for this specific case.
Approach #2: use array map functions
I really like array functions. They are so powerful. We can transform keys and values of the associative array independently and we can combine them into the single array.
function mapForCombo(array $array, Closure $fnValue, Closure $fnKey): array
{
$keys = array_map($fnKey, $array);
$values = array_map($fnValue, $array);
return array_combine($keys, $values);
}
$countriesSelect = mapForCombo(
$countries,
function (Country $country) { return $country->getCode(); },
function (Country $country) { return $country->getName(); }
);
The code is much shorter, but under the hood, PHP has more work. Although there is no any visible loop in the userland, PHP engine must iterate over the array at least 3 times (twice for map and once for combine). It seems slow and not so efficient.
Approach 3: array map by reduce
The next approach may be to use an array_reduce
function instead fo mapping. What this function exactly doing I describe in one of my previous posts. Simply, this approach is to reducing array to the single value. But nobody said that the single value couldn’t be an array!
function mapForCombo(array $array, Closure $fnValue, Closure $fnKey): array
{
return array_reduce($array, function ($acc, $element) use ($fnKey, $fnValue) {
$acc[$fnKey($element)] = $fnValue($element);
return $acc;
}, []);
}
$countriesSelect = mapForCombo(
$countries,
function (Country $country) { return $country->getCode(); },
function (Country $country) { return $country->getName(); }
);
The only thing we should do is to create a function which will assign the transformed value to specific key in the array (similar as on 1st approach). The loop logic is hidden within the array_reduce
function so only care about the correct assignment.
Which solution is better?
Tests
Personally, I don’t like do and read benchmarks for things that usually doesn’t matter. Otherwise, I collect some ideas and I want to validate them in such way. That’s why I test which implementation work faster.
Test file and environment
I prepared a simple test file, you can view it on this GitHub Gist. Although it’s not so complicated, it’s enough to see which implementation is better, relatively to each other. In short – I execute each function 5000 times and I measure the duration of the execution.
First I tried to run this script on my local environment, but I came to the conclusion, that the bettser option is to use a completely different server. I rented a droplet 2GB RAM / 2 CPU from DigitalOcean which ready-to-use LAMP Stack on board and I run the test.
root@lamp-2gb-nyc3-01:~# php -v
PHP 7.0.22-0ubuntu0.16.04.1 (cli) ( NTS )
The test generates an output which I copied and paste as a CSV to the Excel. It allows me to easily calculate the average time and plot a simple chart.
Test results
I have to admit, that results were really surprising to me. First, please take a look at the final outcomes.
- Approach #1 –
foreach
– 0,283s - Approach #2 –
array_map
– 0,253s - Approach #3 –
array_reduce
– 1,054s
Differences are more visible on a simple bar chart.
My first impression was something like a mixture of confusion with misunderstanding. The solution that seemed to be the best turned out, that it was the least efficient. It shows, that we should always test and validate our idea.
I personally use the array_reduce
approach quite often. Fortunately, the test was deliberately exaggerated to shows differences between each approach. For smaller datasets, differences are not noticeable.
It doesn’t mean, that you should not use the array_reduce
function. It has specific use cases and meaning so don’t hesitate to use this function if it fills your needs.
Summary
I’m not sure, whether this experiment will change the way how I write the code. To be honest, it’s only a micro-optimisation on a bigger scale. Although sometimes it does matter, e.g. if you have to write high-efficient algorithms, for most of us it’s just an unnecessary concern.
Otherwise, it reveals something completely different. We believe that things work faster because it seems to be a faster. They execute fewer loops and they contain much less complicated logic and conditions. This example shows that there is no truth in all cases.
As you notice, there isn’t a golden rule to how to transform data to the associative array. Simplicity is the key, but we always should test, confirm and compare our solutions with other. It’s not only about the array transformation but also about many other things. Maybe your next solution will be better, easier and faster?
Resources
Featured photo by G. Crescoli on Unsplash