A keyword and phrase extraction library based on the Rapid Automatic Keyword Extraction algorithm (RAKE). Keywords describe the main topics expressed in a document/text. Keyword extraction in turn allows for the extraction of important words and phrases from text.

Extracted keywords can be used for things like:

  • Building a list of useful tags out of a larger text
  • Building search indexes and search engines
  • Grouping similar content by its topic.

Extracted phrases can be used for things like:

  • Highlighting important areas of a larger text
  • Language or documentation analysis
  • Building intelligent searches based on contextual terms

This library provides an easy method for PHP developers to get a list of keywords and phrases from a string of text and is based on another smaller and unmaintained project called RAKE-PHP, which is a translation from a Python implementation simply called RAKE.

Installing RAKE-PHP with and without composer

Installation

With Composer
$ composer require donatello-za/rake-php-plus
{
    "require": {
        "donatello-za/rake-php-plus": "^1.0"
    }
}
<?php
require 'vendor/autoload.php';

use DonatelloZa\RakePlus\RakePlus;
Without Composer
<?php

require 'path/to/AbstractStopwordProvider.php';
require 'path/to/ILangParseOptions.php';
require 'path/to/LangParseOptions.php';
require 'path/to/StopwordArray.php';
require 'path/to/StopwordsPatternFile.php';
require 'path/to/StopwordsPHP.php';
require 'path/to/RakePlus.php';

use DonatelloZa\RakePlus\RakePlus;

Examples of how to use RAKE-PHP

use DonatelloZa\RakePlus\RakePlus;

$text = "Criteria of compatibility of a system of linear Diophantine equations, " .
    "strict inequations, and nonstrict inequations are considered. Upper bounds " .
    "for components of a minimal set of solutions and algorithms of construction " .
    "of minimal generating sets of solutions for all types of systems are given.";

$phrases = RakePlus::create($text)->get();

print_r($phrases);
Array
(
    [0] => criteria
    [1] => compatibility
    [2] => system
    [3] => linear diophantine equations
    [4] => strict inequations
    [5] => nonstrict inequations
    [6] => considered
    [7] => upper bounds
    [8] => components
    [9] => minimal set
    [10] => solutions
    [11] => algorithms
    [12] => construction
    [13] => minimal generating sets
    [14] => types
    [15] => systems
)

Another example of Rake-PHP in action

use DonatelloZa\RakePlus\RakePlus;

$text = "Criteria of compatibility of a system of linear Diophantine equations, " .
    "strict inequations, and nonstrict inequations are considered. Upper bounds " .
    "for components of a minimal set of solutions and algorithms of construction " .
    "of minimal generating sets of solutions for all types of systems are given.";

// Note: en_US is the default language.
$rake = RakePlus::create($text, 'en_US');

// 'asc' is optional and is the default sort order
$phrases = $rake->sort('asc')->get();
print_r($phrases);
Array
(
    [0] => algorithms
    [1] => compatibility
    [2] => components
    [3] => considered
    [4] => construction
    [5] => criteria
    [6] => linear diophantine equations
    [7] => minimal generating sets
    [8] => minimal set
    [9] => nonstrict inequations
    [10] => solutions
    [11] => strict inequations
    [12] => system
    [13] => systems
    [14] => types
    [15] => upper bounds
)
// Sort in descending order
$phrases = $rake->sort('desc')->get();
print_r($phrases);
Array
(
    [0] => upper bounds
    [1] => types
    [2] => systems
    [3] => system
    [4] => strict inequations
    [5] => solutions
    [6] => nonstrict inequations
    [7] => minimal set
    [8] => minimal generating sets
    [9] => linear diophantine equations
    [10] => criteria
    [11] => construction
    [12] => considered
    [13] => components
    [14] => compatibility
    [15] => algorithms
)
// Sort the phrases by score and return the scores
$phrase_scores = $rake->sortByScore('desc')->scores();
print_r($phrase_scores);
Array
(
    [linear diophantine equations] => 9
    [minimal generating sets] => 8.5
    [minimal set] => 4.5
    [strict inequations] => 4
    [nonstrict inequations] => 4
    [upper bounds] => 4
    [criteria] => 1
    [compatibility] => 1
    [system] => 1
    [considered] => 1
    [components] => 1
    [solutions] => 1
    [algorithms] => 1
    [construction] => 1
    [types] => 1
    [systems] => 1
)
// Extract phrases from a new string on the same RakePlus instance. Using the
// same RakePlus instance is faster than creating a new instance as the
// language files do not have to be re-loaded and parsed.

$text = "A fast Fourier transform (FFT) algorithm computes...";
$phrases = $rake->extract($text)->sort()->get();
print_r($phrases);
Array
(
    [0] => algorithm computes
    [1] => fast fourier transform
    [2] => fft
)

Tags: Data Mining, Degrees, Frequency, Keyword Extraction, Keyword Extraction algorithm, Keywords PHP script, Keywords RAKE, PHP, RAKE, Rake Examples, Rake Source Code, Tag Generator PHP, Word Scores, RAKE-PHP

License: MIT license