The text before

ACL 2013 an article, the content is easy to understand, concise, I am really a bit like, system class article is really too difficult to read!!

Pilehvar M T, Jurgens D, Navigli R. Align, disambiguate and walk: A unified approach for measuring semantic similarity[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2013, 1: 1341-1351.

The body of the

Abstract

Semantic similarity is an essential component of many Natural Language Processing applications. However, prior methods for computing semantic similarity often operate at different levels, e.g., single words or entire documents, which requires adapting the method for each data type. We present a unified approach to semantic similarity that operates at multiple levels, all the way from comparing word senses to comparing text documents. Our method leverages a common probabilistic representation over word senses in order to compare different types of linguistic data. This unified representation shows state-ofthe-art performance on three tasks: semantic textual similarity, word similarity, and word sense coarsening.

Semantic similarity is an important part of many natural language processing applications. However, existing methods for computing semantic similarity often operate at different levels, such as a single word or an entire document, requiring methods to be adapted for each data type. We propose a unified semantic similarity approach that can operate at multiple levels, from comparing word meanings to comparing text documents. Our method uses common probabilistic representations of word meanings to compare different types of linguistic data. This unified representation shows the most advanced representation of three tasks: semantic text similarity, word similarity, and meaning coarsening.

1 Introduction

1 introduction

Semantic similarity is a core technique for many topics in Natural Language Processing such as Textual Entailment (Berant et al., 2012), “Semantic Role Labeling” (Furstenau and Lapata, 2012), “and Question Answering” (Surdeanu et al., 2012), 2011). For example, textual similarity enables relevant documents to be identified for information retrieval (Hliaoutakis et al., 2006), while identifying similar words enables tasks such as paraphrasing (Glickman and Dagan, 2003), lexical substitution (McCarthy and Navigli, 2009), lexical simplification (Biran et al., 2011), and Web search result clustering (Di Marco and Navigli, 2013).

Semantic similarity is a core technique in many topics in natural language processing, such as textual entailance (Berant et al., 2012), semantic role tagging (Furstenau and Lapata, 2012), ¨ sum question answering (Surdeanu et al., 2011). For example, text similarity enables related documents to be identified for information retrieval (Hliaoutakis et al., 2006), while recognizing similar words enables tasks such as interpretation (Glickman and Dagan, 2003), word substitution (McCarthy and Navigli, 2009), Vocabulary simplification (Biran et al. , 2011) and Web search result clustering (Di Marco and Navigli, 2013).

Approaches to semantic similarity have often operated at separate levels: methods for word similarity are rarely applied to documents or even single sentences (Budanitsky and Hirst, 2006; Radinsky et al., 2011; Halawi et al., 2012), while document-based similarity methods require more linguistic features, which often makes them inapplicable at the word or microtext level (Salton et al., 1975; Maguitman et al., 2005; Elsayed et al., 2008; Turney and Pantel, 2010). Despite the potential advantages, few approaches to semantic similarity operate at the sense level due to the challenge in sense-tagging text (Navigli, 2009); for example, none of the top four systems in the recent SemEval-2012 task on textual similarity compared semantic representations that incorporated sense information (Agirre et al., 2012).

Approaches to semantic similarity often operate at different levels: approaches to word similarity are rarely applied to documents or even individual sentences (Budanitsky and Hirst, 2006; Radinsky, etc., 2011; Halawi et al., 2012), while document-based similarity approaches require more linguistic features, which generally make them inapplicable at the word or microtext level (Salton et al., 1975; Maguitman etc., 2005; Elsayed, etc., 2008; Turney and Pantel, 2010). Despite its potential advantages, few semantic similarity methods operate at the semantic level due to the challenges of semantic markup text (Navigli, 2009); For example, none of the first four systems in the recent Semeval-2012 task on textual similarity compare semantic representations containing semantic information (Agirre et al., 2012).

We propose a unified approach to semantic similarity across multiple representation levels from senses to documents, which offers two significant advantages. First, the method is applicable independently of the input type, which enables meaningful similarity comparisons across different scales of text or lexical levels. Second, by operating at the sense level, A unified approach is able to identify the semantic similarity that exist independently of the text’s lexical forms and any semantic ambiguity therein. For example, consider the sentences:

We propose a unified approach to semantic similarity across multiple levels of presentation, from semantics to documents, which provides two significant benefits. First, the method can be applied independently of the input type, which enables semantic similarity comparisons at different textual or lexical levels. Second, by operating at the semantic level, the unified approach is able to identify semantic similarities independent of the lexical forms of the text and any semantic ambiguities within them. For example, consider the sentence:

T1. A manager fired the worker. T1. A manager dismissed the worker.
An employee was terminated from work by his boss.t2. An employee was fired by the boss.

A surface-based approach would label the sentences as dissimilar due to the minimal lexical overlap. However, a sense-based representation enables detection of the similarity between the meanings of the words, e.g., fire and terminate. Indeed, an accurate, sense-based representation is essential for cases where different words are used to convey the same meaning.

The surface-based approach marks sentences as dissimilar due to minimal word overlap. However, semantically based representations make it possible to detect similarities between the meanings of words, for example, fire and terminate. In fact, accurate, semantically based representations are essential for situations where different words are used to express the same meaning.

The contributions of this paper are threefold. First, we propose a new unified representation of the meaning of an arbitrarily-sized piece of text, referred to as a lexical item, using a sense-based probability distribution. Second, we propose a novel alignment-based method for word sense disambiguation during semantic comparison. Third, we demonstrate that this single representation can achieve state-of-the-art performance on three similarity tasks, each operating at a different lexical level: (1) surpassing the highest scores on the SemEval-2012 task on textual similarity (Agirre et al., 2012) that compares sentences, (2) achieving a near-perfect performance on the TOEFL synonym selection task proposed by Landauer and Dumais (1997), which measures word pair similarity, and also obtaining state-of-the-art performance in terms of the correlation with human judgments on the RG-65 dataset (Rubenstein and Goodenough, 1965), and finally (3) surpassing the performance of Snow et al. (2007) in a sensecoarsening task that measures sense similarity

The contribution of this paper is threefold. First, we propose a new unified representation using semantically based probability distributions, in which the semantics of text of any size are called lexical terms. Secondly, we propose a new method of word sense disambiguation in semantic comparison based on alignment. Third, we show that this single representation can achieve state-of-the-art performance on three similar tasks, each running at a different lexical level: (1) surpassed the highest score on text similarity in the Semeval-2012 task (Agirre et comparison of sentences, and (2) achieved near-perfect performance on the TOEFL synonym selection task proposed by Landauer and Dumais (1997), which measures word pairs similarity, It also achieved the most advanced performance in terms of the relevance of the RG-65 dataset (Rubenstein and Goodenough, 1965) to human judgment, and finally (3) surpassed that of Snow et al. (2007) in a semantic training task to measure semantic similarity

2 A Unified Semantic Representation

2 Unified semantic representation

We propose a representation of any lexical item as a distribution over a set of word senses, referred to as the item’s semantic signature. We begin with a formal description of the representation at the sense level (Section 2.1). Following this, we describe our alignment-based disambiguation algorithm which enables us to produce sense-based semantic signatures for those lexical items (e.g., words or sentences) which are not sense annotated (Section 2.2). Finally, we propose three methods for comparing these signatures (Section 2.3). As our sense inventory, we use WordNet 3.0 (Fellbaum, 1998).

We propose to represent any lexical item as a distribution of a set of meanings, called the semantic signature of the item. We begin with a formal description of the representation at the semantic level (Section 2.1, p. Next, we describe an allocation-based disambiguation algorithm that enables us to generate semantically based semantic signatures for lexical items (for example, words or sentences) that do not have semantic annotations (Section 2.2, p. Finally, we propose three ways to compare these signatures (Section 2.3, p. For our semantic inventory, we use WordNet 3.0 (Fellbaum, 1998).

2.1 Semantic Signatures

2.1 Semantic Signature

The WordNet ontology provides a rich network structure of semantic relatedness, connecting senses directly with their hypernyms, and providing information on semantically similar senses by virtue of their nearby locality in the network. Given a particular node (sense) in the network, repeated random walks beginning at that node will produce a frequency distribution over the nodes in the graph visited during the walk. To extend beyond a single sense, the random walk may be initialized and restarted from a set of senses (seed nodes), rather than just one; this multi-seed walk produces a multinomial distribution over all the senses in WordNet with higher probability assigned to senses that are frequently visited from the seeds. Prior work has demonstrated that multinomials generated from random walks over WordNet can be successfully applied to linguistic tasks such as word similarity (Hughes and Ramage, 2007; Agirre et al., 2009), paraphrase recognition, textual entailment (Ramage et al., 2009), and pseudoword generation (Pilehvar and Navigli, 2013).

WordNet ontology provides a rich semantically dependent network structure, which directly relates semantics to its upper word and provides semantically similar semantic information by virtue of its nearby location in the network. Given a particular node (semantically) in the network, repeated random walks starting at that node will produce frequency distributions at nodes in the graph visited during the walk. To go beyond a single semantics, a random walk can be initialized and restarted from a set of semantics (seed nodes) rather than just one; This multi-seed walking results in a multinomial distribution of all semantics in WordNet, with higher probability assigned to semantics that are frequently accessed from seeds. Previous work has shown that polynomials generated by random roaming of WordNet can be successfully applied to linguistic tasks such as word similarity (Hughes and Ramage, 2007; Agirre et al., 2009), Paraphrase recognition, textual implication (Ramage et al., 2009) and pseudo-word generation (Pilehvar and Navigli, 2013).

Formally, we define the semantic signature of a lexical item as the multinomial distribution generated from the random walks over WordNet 3.0 where the set of seed nodes is the set of senses present in the item. This representation encompasses both when the item is itself a single sense and when the item is a sense-tagged sentence.

Formally, we define the semantic signature of a lexical item as a multinomial distribution generated from a random walk on WordNet 3.0, where the seed node set is a set of semantics that exist in the item. This representation includes when the item itself is a single semantic and when the item is a semantically marked sentence.

To construct each semantic signature, we use the iterative method for calculating topic-sensitive PageRank (Haveliwala, 2002). Let M be the adjacency matrix for the WordNet network, where edges connect senses according to the relations defined in WordNet (e.g., hypernymy and meronymy). We further enrich M by connecting a sense with all the other senses that appear in its disambiguated gloss_1. Let ~v(0) denote the probability distribution for the starting location of the random walker in the network. Given the set of senses S in a lexical item, The probability mass of ~ V (0) is uniformly-distributed across the senses ∈ S, For all SJ ∈ S set to zero. The PageRank may then be computed using:

To construct each semantic signature, we use an iterative approach to calculate the topic-sensitive PageRank (Haveliwala, 2002). Let M be the adjacency matrix of the WordNet network, where edges connect semantics according to relationships defined in WordNet (for example, Hypernymy and Meronymy). We further enrich M by associating one semantics with all the other semantics that appear in the disambiguation annotations. Let ~v (0) represent the probability distribution of the initial position of random walk in the network. Given a set of semantic S in a lexical term, the probability mass ~ v (0) is uniformly distributed over the semantic Si ∈S, and the mass of all sj /∈ S is set to zero. PageRank can then be calculated using the following formula:

Where the at each iteration, the random walker may jump to any node with si ∈ S aim-listed probability of alpha / | S |. We follow standard Convention and set α to 0.15. We repeat the operation in Eq. 1 for 30 iterations, which is sufficient for the distribution to converge. The resulting probability vector ~v(t) is the semantic signature of the lexical item, As it has aggregated its senses’ aggregated over the entire graph. For our semantic signatures we used the UKB2 off-the-shelf implementation of topic-sensitive PageRank.

In each iteration, the random walk to probability of alpha / | S | to jump to any node si ∈ S. We follow standard conventions and set α to 0.15. We repeat the operation in the equation. Do 30 iterations at a time, which is enough to make the distribution converge. The resulting probability vector ~v (t) is the semantic signature of the term because it aggregates its semantic similarity across the whole graph. For our semantic signature, we used UKB2’s off-the-shelf topic-sensitive PageRank implementation.

Alignment – 2.2 -based Disambiguation

2.2 Alignment based disambiguation

Commonly, semantic comparisons are between word pairs or sentence pairs that do not have their lexical content sense-annotated, despite the potential utility of sense annotation in making semantic comparisons. However, traditional forms of word sense disambiguation are difficult for short texts and single words because little or no contextual information is present to perform the disambiguation task. Therefore, we propose a novel alignment-based sense disambiguation that leverages the content of the paired item in order to disambiguate each element. Leveraging the paired item enables our approach to disambiguate where traditional sense disambiguation methods can not due to insufficient context.

Typically, semantic comparisons are made between word pairs or sentence pairs that do not have semantic annotations for lexical content, although semantic annotations have potential utility in making semantic comparisons. However, for short texts and individual words, traditional forms of word sense disambiguation are difficult because there is little or no context information to perform disambiguation tasks. Therefore, we propose a novel semantic disambiguation based on alignment, which uses the content of paired terms to disambiguate each element. The use of paired terms enables our method to eliminate the traditional semantic disambiguation method which cannot be disambiguated due to insufficient context.

We view sense disambiguation as an alignment problem. Given two arbitrarily ordered texts, we seek the semantic alignment that maximizes the similarity of the senses of the context words in both texts. To find this maximum we use an alignment procedure which, for each word type wi in item T1, assigns wi to the sense that has the maximal similarity to any sense of the word types in the compared text T2. Algorithm 1 formalizes the alignment process, which produces a sense disambiguated representation as a result. Senses are compared in terms of their semantic signatures, denoted as function R. We consider multiple definitions of R, defined later in Section 2.3.

We treat semantic disdiscrimination as a matter of alignment. Given two texts in arbitrary order, we seek semantic alignment to maximize semantic similarity of context words in the two texts. To find this maximum, we use an alignment program that, for each word type WI in item T1, assigns wi to the semantics that have the greatest similarity to any of the word types in comparison text T2. Algorithm 1 formalizes the alignment process to produce a semantic disambiguation representation. Compare the semantic signatures of the semantics, expressed as function r. We consider multiple definitions of R, defined later in Section 2.3.

As a part of the disambiguation procedure, we leverage the one sense per discourse heuristic of Yarowsky (1995); given all the word types in two compared lexical items, each type is assigned a single sense, even if it is used multiple times. Additionally, if the same word type appears in both sentences, both will always be mapped to the same sense. Although such a sense assignment is potentially incorrect, assigning both types to the same sense results in a representation that does no worse than a surface-level comparison.

As part of the disambiguation program, we utilize Yarowsky (1995) ‘s heuristic for each utterance of one kind of semantics; Given all the word types in two comparison terms, each type is assigned a single semantics even if used multiple times. In addition, if the same word type occurs in two sentences, they will always map to the same semantics. Although this semantic assignment may be incorrect, assigning two types to the same semantics results in a presentation that is no worse than a surface level comparison.

We illustrate the alignment-based disambiguation procedure using the two example sentences t1 and t2 given in Section 1. Figure 1(a) illustrates example alignments of the first sense of manager to the first two senses of the word types in Sentence T2 along with the similarity of the two senses’ semantic signatures. For the senses of manager, sense manager 1-n obtains the maximal similarity value to boss1-n among all the possible pairings of the senses for the word types in sentence t2, and as a result is selected as the sense labeling for manager in sentence t1. Figure 1(b) shows the final, maximally-similar sense alignment of the word types in t1 and t2. The resulting alignment produces the following sets of senses:

We use the examples T1 and T2 given in Section 1 to illustrate an allocation-based disambiguation program. Figure 1 (a) shows the example alignment of the first sentence Manager and the first two semantics of the word type in sentence T2, as well as the similarity of the semantic features of the two semantics. For the semantics of Manager, semantic Manager 1-n gets the maximum similarity value with Boss1-n among all possible semantic pings of word types in sentence T2, and is therefore selected as the semantic marker for Manager in sentence T1. Figure 1 (b) shows the final, most similar semantic alignment of word types in T1 and T2. The resulting alignment results in the following sets of semantics:

2.3 Semantic Signature Similarity

2.3 Semantic signature Similarity

Cosine Similarity. In order to compare semantic signatures, we adopt the Cosine similarity measure as a baseline method. The measure is computed by treating each multinomial as a Vector and then generally the normalized dot product of the two signatures’ vectors.

Cosine similarity. In order to compare semantic signatures, we use cosine similarity measure as baseline method. The metric is computed by treating each polynomial as a vector and then computing the normalized dot product of the two signature vectors.

However, a semantic signature is, in essence, a weighted ranking of the importance of WordNet senses for each lexical item. Given that the WordNet graph has a non-uniform structure, and also given that different lexical items may be of different sizes, the magnitudes of the probabilities obtained may differ significantly between the two multinomial distributions. Therefore, for computing the similarity of two signatures, we also consider two nonparametric methods that use the ranking of the senses, rather than their probability values, in the multinomial.

However, semantic signatures are essentially a weighted ranking of WordNet semantics for the importance of each lexical item. Given the uneven structure of the WordNet diagram and the fact that different lexical terms may have different sizes, the magnitude of the probabilities obtained may differ significantly between the two multinomial distributions. Therefore, to calculate the similarity of the two signatures, we also consider two nonparametric methods that use semantic ranking in polynomials rather than probability values.

Weighted Overlap. Our first measure provides a nonparametric similarity by comparing the similarity of the rankings for intersection of the sensesin both semantic signatures. However, we additionally weight the similarity such that differences in the highest ranks are penalized more than differences in lower ranks. We refer to this measure as the Weighted Overlap. Let S denote the intersection of all senses with non-zero Probability in both signatures and rji denote the rank of sense SI ∈ Sin signature J, where rank 1 denotes the highest rank. The sum of the two ranks r1iand r2i for a sense is then inverted, which (1) weights higher ranks more and (2) when summed, provides the maximal value when a sense has the same rank in both signatures. The un normalized weighted overlap is then Calculated as P | S | I = 1 (r1i r2i) – 1. Then, the to bound the similarity value in [0, 1], we normalize the sum by its maximum value, P (2 I) | S | I = 1-1, which occurs when the each sense has the same rank in both signatures.

Weighted overlap. Our first measure provides nonparametric similarity by comparing the rank similarity of the intersection of semantics in two semantic signatures. However, we also weighted similarity so that differences in the highest grades were penalized more than differences in the lower grades. We call this measure weighted overlap. Let S represent the intersection of all semantics with non-zero probability in the two signatures, and rji represent the rank of semantic Si ∈Sin signature J, where rank 1 represents the highest k. The sum of the two levels r1 and R2ifor asense is then reversed, where (1) higher weight higher and (2) provide maximum value when summation when semantics have the same level in both signatures. Then will not standardized weighted overlap calculation for P | S | I = 1 (r1i r2i) – 1. Then, to restrict similarity value in [0, 1], we will and are normalized its maximum P | S | I = 1 (I) 2-1, and that in each semantic in two signature with the same rank.

Top-k Jaccard. Our second measure uses the ranking to identify the top-k senses in a signature, which are treated as the best representatives of the conceptual associates. We hypothesize that a specific rank ordering may be attributed to small differences in the multinomial probabilities, which can lower rank-based similarities when one of the compared orderings is perturbed due to slightly different probability values. Therefore, we consider the top-k senses as an unordered set, with equal importance in the signature. To compare two signatures, We compute the Jaccard Index of the two signatures’ sets:

Top – k Jaccard. Our second step is to use this standard to identify top-K semantics in signatures that are considered the best representatives of concept partners. We assume that particular ordering may be due to small differences in multinomial probabilities, which can reduce rank-based similarity when a proportion of the comparison is perverted due to slightly different probability values. Therefore, we consider top-K semantics to be unordered sets with equal importance in signatures. To compare two signatures, we calculate the Jaccard index of the two signature sets:

where Uk denotes the set of k senses with the highest probability in the semantic signature U.

Where Uk represents the set of k semantics with the highest probability in semantic signature U.

3 Experiment 1: Textual Similarity

3 Experiment 1: Text similarity

Measuring semantic similarity of textual items has applications in a wide variety of NLP tasks. Asour benchmark, we selected the recent SemEval2012 task on Semantic Textual Similarity (STS),which was concerned with measuring the semantic similarity of sentence pairs. The task received considerable interest by facilitating a meaningful comparison between approaches.

Measuring semantic similarity of text items has applications in various NLP tasks. For the Asour benchmark, we chose the recent SemEval2012 Semantic Text Similarity (STS) task, which involves measuring semantic similarity of sentence pairs. The task gains interesting semantics by facilitating semantic comparisons between methods.

3.1 Experimental Setup

3.1 Experimental Settings

Data. We follow the experimental setup used inthe STS task (Agirre et al., 2012), which provided five test sets, two of which had accompanying training data sets for tuning system performance. Each sentence pair in the datasets was given a score from 0 to 5 (low to high similarity) by human judges, with a high inter-annotator agreement of around 0.90 when measured using the Pearson correlation coefficient. Table 1 lists the number of sentence pairs in training and test portions of each dataset

The data. We followed the experimental setup used in the STS mission (Agirre et al., 2012), which provided five test sets, two of which had accompanying training data sets for tuning system performance. Each sentence pair in the dataset was given a score of 0 to 5 (from low to high similarity) by the human evaluator and had a high out-of-annotator protocol of about 0.90 when measured using Pearson’s correlation coefficient. Table 1 lists the logarithms of sentences in the training and testing sections of each dataset

Comparison Systems. The top-ranking participating systems in the SemEval-2012 task were generally supervised systems utilizing a variety of lexical resources and similarity measurement techniques. We compare our results against the top three systems of the 88 submissions: TLsim and TLsyn, the Two Systems of Sari society C et al. (2012), and ´the UKP2 system (Bar et al., UKP2 utilizes “extensive resources among which are a Distributional Thesaurus computed on 10M dependency parsed English sentences. In addition, the system utilizes techniques such as Explicit Semantic Analysis (Gabrilovich and Markovitch, 2007) and makes use of resources such as Wiktionary and Wikipedia, a lexical substitution system based on supervised word sense disambiguation (Biemann,2013), and a statistical machine translation system. The TLsim system uses the New York Times Annotated Corpus, Wikipedia, and Google BookNgrams. The TLsyn system also uses GoogleBook Ngrams, as well as dependency parsing and named entity recognition.

Compare systems. The highly ranked participating system in the SemEVAL-2012 mission is a general supervision system that utilizes a variety of resources and similarity measurement techniques. We compared our results with 88 submitted first three systems: TLsim and TLsyn, Sari C et al. ‘s two systems. (2012) and ‘UKP2 system ‘(Bar et al. , 2012). UKP2 leverages a number of resources, including a distributed thesaurus that analyzes English sentences based on 10M dependencies. In addition, the system utilizes techniques such as explicit semantic analysis (Gabrilovich and Markovitch, 2007), as well as resources such as Wiktionary and Wikipedia, based on supervised meaning

~ Nothing to write, there is no reading problem, I will not translate. ~

After the body

Slip away, read the paper to go, although this article is a little old, but still quite have reference meaning to say