Measuring strings

int strlen ( string source)

mixed count_chars ( string string [, int mode])

mixed str_word_count ( string string [, int format])

Measuring a string and its contents can be done in three separate ways. The easiest (and most "obvious") way to measure a string is to count the number of characters in the string, and this task is performed by the strlen() function, which takes just one parameter (the string), and returns the number of characters in it. It is so easy to use it barely merits an example, but just to make sure we're both reading from the same song sheet:

    print strlen("Foo") . "\n"; // 3
    print strlen("Goodbye, Perl!") . "\n"; // 14

There really is not anything else about strlen() to learn - it is a very simple function, and thus works very simply. Having said that, it is very useful, and is likely to crop up in many scripts that you write.

The other two functions, count_chars() and str_word_count() measure the contents of a string in different ways: count_chars(), when given a string, returns an array containing the letters used in that string and how many times each letter was used, whereas calling str_word_count() without any parameters returns the number of words used.

Using count_chars() is complicated somewhat by the fact that it actually returns an array of exactly 255 elements by default, with each number in there evaluating to an ASCII code. You can work around this by filtering through the array to remove items that have a value (frequency) of 0, or, alternatively, you can pass a second parameter to the function. If you pass 1, only letters with a frequency greater than 0 are listed, if you pass 2 only letters with a frequency equal to 0 are listed.

Similarly, you can pass a second parameter to str_word_count() to make it do other things. By default, it just returns the number of unique words that were found in the string. However, if you pass 1 as the second parameter it will return an array of the words found, and passing 2 does the same, except the key of each word will be set to the position that word was found inside the string.

Here is an example of both functions in action:

    $str = "This is a test, only a test, and nothing but a test.";
    $a = count_chars($str, 1);
    $b = str_word_count($str, 1);
    $c = str_word_count($str, 2);
    $d = str_word_count($str);
    echo "There are $d words in the string\n";

That should output the following (note that I have taken out much of the whitespace to save space):

Array ( [32] => 11 [44] => 2 [46] => 1 [84] => 1 [97] => 4 [98] => 1 [100] => 1 [101] => 3 [103] => 1 [104] => 2 [105] => 3 [108] => 1 [110] => 4 [111] => 2 [115] => 5 [116] => 8 [117] => 1 [121] => 1)
Array ( [0] => This [1] => is [2] => a [3] => test [4] => only [5] => a [6] => test [7] => and [8] => nothing [9] => but [10] => a [11] => test )
Array ( [0] => This [5] => is [8] => a [10] => test [16] => only [21] => a [23] => test [29] => and [33] => nothing [41] => but [45] => a [47] => test )
There are 12 words in the string

In the first array print out, ASCII codes are used for the numbers inside the square brackets (the array keys) and the frequencies of each letter are used as the other numbers (the array values). In the second printout, the array keys are irrelevant, but the array values are the list of the words found - note that the comma and full stop are not in there as they are not considered words. In the third print out, the array keys mark where the first letter of the word in the value was found, thus "0" means "This" was found at the beginning of the string. The last print out shows the default word-counting behaviour of str_word_count().


Want to learn PHP 7?

Hacking with PHP has been fully updated for PHP 7, and is now available as a downloadable PDF. Get over 1200 pages of hands-on PHP learning today!

If this was helpful, please take a moment to tell others about Hacking with PHP by tweeting about it!

Next chapter: Finding a string within a string >>

Previous chapter: Converting to and from ASCII

Jump to:


Home: Table of Contents

Copyright ©2015 Paul Hudson. Follow me: @twostraws.