PTA 7-44 Word Frequency Based file Similarity (String processing +set container)

String handling
Use of set

A simple original file similarity calculation is implemented, that is, the similarity is defined by the proportion of the common words in the total words of the two files. To simplify matters, Chinese is not considered here (because word segmentation is too difficult). Only English words that are at least 3 in length and 10 in length are considered, and only the first 10 letters are considered if they are longer than 10. Input format: Input begins with a positive integer N (≤100), which is the total number of files. The contents of each file are then given in the following format: the body of the file is given first, and the end of the file is given with a single character # in a line. After N file contents end, the total number of query M (≤10 4) is given, followed by M lines. Each line is given a pair of file numbers separated by Spaces. It is assumed that the files are numbered from 1 to N in the given order. Output format: For each query, output the similarity of the two files in one line, that is, the percentage of the common vocabulary of the two files to the total vocabulary of the two files, accurate to 1 decimal place. Note that a “word” here includes only letters that are at least 3 in length and 10 in length, and only the first 10 letters that are longer than 10 are considered. Separate words with any non-English letter. In addition, the same word with different case is considered the same word, such as “You” and “You” being the same word. 3 Aaa Bbb Ccc # Bbb Ccc Ddd # Aaa2 Ccc Eee is at Ddd@Fff # 2 1 2 1 3 Example output: 50.0% 33.3%

In this case, we need to read each row of data, and then according to whether it is an English letter, to process the segmentation, and then use the set container to store different data.

This year the epidemic at home, sometimes because of something, there are a lot of chores to deal with, so the efficiency is not high, I still have a lot of goals to achieve, there are a lot of things to do, my life will be very wonderful, come on!

The complete code is as follows:

#include <iostream>
#include <set>
#include <string>
#include <cctype>
using namespace std;

#define MAXN 105

int N, M;                // Total number of files
string str;              // Read each line
set<string> files[MAXN]; // The words in the file

void handleStr(string str, int No)
{
    string word;
    str += "."; // The last word can be processed
    for (int i = 0; i < str.size(a); i++) {if (isalpha(str[i]))
        {
            if (word.size()"10)
                word += tolower(str[i]);
        }
        else
        {
            if (word.size(a) >2 && word.size()"11)
                files[No].insert(word);
            word.clear(a); }}}int main(a)
{
    scanf("%d", &N);
    for (int i = 1; i <= N; i++)
    {
        do
        {
            getline(cin, str);
            handleStr(str, i);
        } while(str ! ="#");
    }
    scanf("%d", &M);
    int u, v;
    int same = 0, total = 0;
    for (int i = 0; i < M; i++)
    {
        scanf("%d%d", &u, &v);
        total = (int)files[u].size() + (int)files[v].size(a); same =0;
        for (set<string>::iterator it = files[u].begin(a); it ! = files[u].end(a); it++) {if(files[v].find(*it) ! = files[v].end()) { same++; total--; }}printf("%.1f%%\n", total == 0 ? 0 : same * 100.0 / total);
    }
    return 0;
}
Copy the code

PTA 7-44 Word Frequency Based file Similarity (String processing +set container)

Related Posts

Gitlab+Jenkins+SonarQube realizes code quality scanning automation

(6) | SpringBoot tutorial SpringBoot development interceptors

RocketMQ Namesrv Metadata store