This is the fourth day of my participation in the November Gwen Challenge. Check out the details: The last Gwen Challenge 2021
Full-text retrieval of sphinx search content is fine.
But after the search, there was still a bit of a problem with sorting articles. Well, I started by using reverse chronological sorting, so there was a slight problem that the results I wanted, or the ones closest to my search terms, didn’t appear in the first few pages. It’s not a good experience.
Then, I used the built-in function similar_text in PHP to calculate the similarity of the description and title of the article, and then sorted them in reverse order according to the similarity value after calculation.
I’ve encapsulated a function here: it’s just an example, but it depends on your needs
function similar_arr($array.$keyword.$arr_key = 'title')
{
// Array similarity processing
foreach ($array as $key= >$value)
{
similar_text($value[$arr_key].$keyword.$percent);
$value['percent'] = $percent;
$data[] = $value;
}
// Take the percent column from the array and return a one-dimensional array
$percent = array_column($data.'percent');
// Sort by percent
array_multisort($percent, SORT_DESC, $data);
return $data;
}
// $data is a two-dimensional array
$res = similar_arr($data.'wechat Mini Program');
var_dump($res);
Copy the code
This is fine, but it is not very friendly to Chinese similarity calculation. It’s a bit of a blind alley.
What can I do about it? You can’t use this thing either.
There are still a lot of aunts on Baidu. Here I found a class that calculates Chinese similarity. Here I changed it slightly:
The whole is as follows:
Lcscontroller.php
namespace App\Http\Controllers\index;
/ * * *@name: Article similarity calculation class *@author: camellia
* @date: the 2021-03-04 * /
class LcsController extends BaseController
{
private $str1;
private $str2;
private $c = array(a);/ * * *@name: returns the longest common subsequence * of strings one and two@author: camellia
* @date: 2021-03-04
* @param: data type description
* @return: data type description
*/
public function getLCS($str1.$str2.$len1 = 0.$len2 = 0)
{
$this->str1 = $str1;
$this->str2 = $str2;
if ($len1= =0) $len1 = strlen($str1);
if ($len2= =0) $len2 = strlen($str2);
$this->initC($len1.$len2);
return $this->printLCS($this->c, $len1 - 1.$len2 - 1);
}
/ * * *@name: Returns the similarity of two strings *@author: camellia
* @date: 2021-03-04
* @param: data type description
* @return: data type description
*/
public function getSimilar($str1.$str2)
{
$len1 = strlen($str1);
$len2 = strlen($str2);
$len = strlen($this->getLCS($str1.$str2.$len1.$len2));
if(($len1 + $len2) > 0)
{
return $len * 2 / ($len1 + $len2);
}
else
{
return 0; }}/ * * *@name: Function name *@author: camellia
* @date: 2021-03-04
* @param: data type description
* @return: data type description
*/
public function initC($len1.$len2)
{
for ($i = 0; $i < $len1; $i{+ +)$this->c[$i] [0] = 0;
}
for ($j = 0; $j < $len2; $j{+ +)$this->c[0] [$j] = 0;
}
for ($i = 1; $i < $len1; $i{+ +)for ($j = 1; $j < $len2; $j{+ +)if ($this->str1[$i] = =$this->str2[$j])
{
$this->c[$i] [$j] = $this->c[$i - 1] [$j - 1] + 1;
}
else if ($this->c[$i - 1] [$j] > =$this->c[$i] [$j - 1])
{
$this->c[$i] [$j] = $this->c[$i - 1] [$j];
}
else
{
$this->c[$i] [$j] = $this->c[$i] [$j - 1]; }}}}/ * * *@name: Function name *@author: camellia
* @date: 2021-03-04
* @param: data type description
* @return: data type description
*/
public function printLCS($c.$i.$j)
{
if($i < 0 || $j < 0)
{
return "";
}
if ($i= =0 || $j= =0)
{
if ($this->str1[$i] = =$this->str2[$j])
{
return $this->str2[$j];
}
else
{
return ""; }}if ($this->str1[$i] = =$this->str2[$j])
{
return $this->printLCS($this->c, $i - 1.$j - 1).$this->str2[$j];
}
else if ($this->c[$i - 1] [$j] > =$this->c[$i] [$j - 1])
{
return $this->printLCS($this->c, $i - 1.$j);
}
else
{
return $this->printLCS($this->c, $i.$j - 1); }}}Copy the code
Call:
/ * * *@name: Sorts arrays by similarity *@author: camellia
* @date: 2021-03-04
* @param: data type description
* @return: data type description
*/
public function similar_arr($array.$keyword.$arr_key_one = 'arttitle'.$arr_key_two='content'.$arr_key_three= 'artdesc')
{
$lcs = new LcsController();
// Array similarity processing
foreach ($array as $key= >$value) {
// Similar_text is not very friendly to Chinese similarity
// similar_text($value[$arr_key], $keyword, $percent);
$title_percent = $lcs->getSimilar($value[$arr_key_one].$keyword);
// Returns the longest common subsequence
//echo $lcs->getLCS("hello word","hello china");
$value['title_percent'] = $title_percent;
// $content_percent = $lcs->getSimilar($value[$arr_key_two], $keyword);
// $value['content_percent'] = $content_percent;
$desc_percent = $lcs->getSimilar($value[$arr_key_three].$keyword);
$value['desc_percent'] = $desc_percent;
$data[] = $value;
}
// Take the percent column from the array and return a one-dimensional array
// $percent = array_column($data, 'percent');
// Sort by percent
// array_multisort($percent, SORT_DESC, $data);
// $array = $this->sortArrByManyField($data, 'title_percent', SORT_DESC, 'content_percent', SORT_DESC, 'desc_percent', SORT_DESC);
$array = $this->sortArrByManyField($data.'title_percent',SORT_DESC, 'id', SORT_DESC, 'desc_percent', SORT_DESC );
return $array;
}
/ * * *@name: a two-dimensional array of PHP sorts by multiple fields *@author: camellia
* @date: 2021-03-04
* @param: data type description
* @return: data type description
*/
public function sortArrByManyField()
{
$args = func_get_args(); // Get an array of function arguments
if(empty($args))
{
return null;
}
$arr = array_shift($args);
if(! is_array($arr))
{
throw new Exception("The first argument is not an array");
}
foreach($args as $key= >$field)
{
if(is_string($field)) {$temp = array(a);foreach($arr as $index= >$val) {$temp[$index] = $val[$field];
}
$args[$key] = $temp; }}$args[] = &$arr;/ / reference value
call_user_func_array('array_multisort'.$args);
return array_pop($args);
}
Copy the code
Call the calculate similarity method
$listShow = $this->similar_arr($list.$search.'arttitle');
Copy the code
The resulting similarity isn’t exactly accurate, but it’s better than the built-in similar_text function in PHP.
For specific results, please visit my personal blog: guanchao.site
For good suggestions, please enter your comments below.
Welcome to my blog guanchao.site
Welcome to applets: