Original address:
Tweet analyzer
Introduction
Implement the Tweet Analyzer to practice the basic usage of Python programming.
Requirement
This assignment is based on the social network company Twitter (Links to an external site.). Twitter allows users to read and post tweets that are between 1 and 280 characters long, inclusive. In this assignment, you will be writing functions that (we imagine) are part of the programs that manage Twitter feeds.
Here are some example tweets:
- Standing ovation as Setsuko Thurlow is awarded a Doctor of Laws degree, honoris causa, by the University of Toronto @UofT for her tireless nuclear disarmament work and contributions to the Treaty on the Prohibition of Nuclear Weapons with @nuclearban ICAN
- Congratulations to our class of 2019 #UofTGrad19
- #UofT’s @ProbabilityProf @UofTStatSci created a mathematical model at the start of the playoffs to figure out the team’s odds of winning. He predicts their home-court advantage will give them an edge. http://bit.ly/ProbProf
Some terminology
- tweet: A message posted on Twitter. For our purposes, the message text is between 1 and
MAX_TWEET_LENGTH
characters long (inclusive).MAX_TWEET_LENGTH
is a constant. - tweet word: A word in a tweet. For our purposes, a tweet word contains only alphanumeric characters and underscores. For example,
pink_elephant
is a tweet word, whilebits&pieces
is not (In fact,bits&pieces
is two tweet words,bits
andpieces
, with an ampersand (&
) between them.) - hashtag: A word in a tweet that begins with the hash symbol. Twitter uses the number sign (
#
) as the hash symbol. For our assignment, we’ll use the constantHASHTAG_SYMBOL
to represent the hash symbol. Hashtags are used to label important words or terms in a tweet. A valid hashtag has the hash symbol as its first character and the rest of the characters form a tweet word. In other words, a hashtag begins with the hash symbol, and contains all alphanumeric characters and underscores up to (but not including) the first non-alphanumeric character (such as space, punctuation, etc.) or the end of the tweet. A hashtag either begins a tweet or is preceded by a character that is not alphanumeric and is not an underscore. A hashtag must contain at least one alphanumeric character.#UofT
.#csc108
, and#Go_Raptors
are three examples of hashtags on Twitter.
Note that a hashtag is not a tweet word, because it has the hash symbol as its first character.
- mention: A word in a tweet that begins with the mention symbol. Twitter uses the at-sign (
@
) as the mention symbol. For our assignment, we’ll use the constantMENTION_SYMBOL
to represent the mention symbol. Mentions are used to direct a message at or about a particular Twitter user, so the word should be a Twitter username (but for the purposes of this assignment, we will not check if the username is valid we’ll just assume it). For our purposes, the definition of a mention is very similar to that of a hashtag. A valid mention has the mention symbol as its first character and the rest of the characters form a tweet word. In other words, a mention begins with the at-sign, and contains all alphanumeric characters and underscores up to (but not including) the first non-alphanumeric character (such as space, punctuation, etc.) or the end of the tweet. A mention either begins a tweet or is preceded by a character that is not alphanumeric and is not an underscore. A mention must contain at least one alphanumeric character.@redcrosscanada
.@UN_Women
, and@UofTGrad2019
are three examples of Twitter mentions.
Note that a mention is not a tweet word, because it has the mention symbol as its first character.
Here are some more interesting examples of how we will treat tweet words, hashtags, and mentions in this assignment.
In the tweet
Raptors win championship,#NBAFINALS, Go @Raptors!!! #WeTheNorth
we have four tweet words (Raptors
, win
, championship
, and Go
), two hashtags (#NBAFINALS
and #WeTheNorth
), and one mention (@Raptors
). It is important to note that in this example there is no space between the first comma and the hashtag #NBAFINALS
, there is a comma immediately following the hashtag #NBAFINALS
, there are three exclamation marks immediately following the mention @Raptors
, and there are more than one space after the exclamation marks. All these are valid in a tweet. Also note that the first occurrence of the word Raptors
is not considered to be a mention, because it does not have the mention symbol.
In the tweet
@UofT welcomes its 2019 graduates! #UofTGrad2019#graduation!
we have four tweet words (welcomes
, its
, 2019
, and graduates
), two hashtags (#UofTGrad2019
and #graduation
), and one mention (@UofT
). It is important to note that in this example there is no space between hashtags #UofTGrad2019
and #graduation
. This is also valid in a tweet.
Some more obscure yet valid examples:
- In
something#something_else
we considersomething
is a tweet word and#something_else
is a hashtag. - In
no@spaces#whatsoever? !
we considerno
is a tweet word,@spaces
is a mention, and#whatsoever
is a hashtag.
For a complete list of Twitter terms, check out the Twitter glossary (Links to an external site.).
Starter code
For this assignment, we are giving you some files, including a Python starter code files. Please download the Assignment 1 Files and extract the zip archive.
- Starter code: tweet.py
This file contains some constants, the header and the complete docstring (but not body) for the first function you are to write. Your job is to complete this file.
- Checker: a1_checker.py
We have provided a checker program that you should use to check your code. See below for more information about a1_checker.py.
Constants
Constants are special variables whose values do not change once assigned. A different naming convention (uppercase pothole) is used for constants, so that programmers know to not change their values. For example, in the starter code, the constant MAX_TWEET_LENGTH
is assigned the value 50 at the beginning of the module and the value of MAX_TWEET_LENGTH
should never change in your code. When writing your code, if you need to use the value of the maximum tweet length, you should use MAX_TWEET_LENGTH
. The same goes for the other constant values.
Using constants simplifies code modifications and improves readability. If we later decide to use a different tweet length, we would only have to change the length in one place (the MAX_TWEET_LENGTH
assignment statement), rather than throughout the program.
What to do
In the starter code file tweet.py
, complete the following function definitions. Use the Function Design Recipe that you have been learning in this course. We have included the type contracts in the following table; please read through the table to understand how the functions will be used.
We will be evaluating your docstrings in addition to your code. Please include two examples in your docstrings. You will need to paraphrase the full descriptions of the functions to get an appropriate docstring description.
Function name: (Parameter types) -> Return type | Full Description (paraphrase to get a proper docstring description) |
---|---|
is_valid_tweet: (str) -> bool | The parameter represents a potential tweet. The function should return True if and only if the tweet contains between 1 and MAX_TWEET_LENGTH characters, inclusive. |
compare_tweet_lengths: (str, str) -> int | The two parameters represent valid tweets. This function must return one of three integers: 1 (if the first tweet is longer than the second), - 1 (if the second tweet is longer than the first), or 0 (if the tweets have the same length). |
add_hashtag: (str, str) -> str | The first parameter represents a valid tweet. The second parameter represents a tweet word. Appending a space, a hash symbol, and the tweet word to the end of the original tweet will result in a potential tweet. If the potential tweet is a valid tweet, the function should return the potential tweet. If the potential tweet is not a valid tweet, the function should return the original tweet. For example (assuming the hash symbol is The '#' ), if the first argument is 'I like' and the second argument is 'csc108' , then the function should return 'I like #csc108' , if MAX_TWEET_LENGTH is at least 14. Otherwise, it should return 'I like' . |
contains_hashtag: (str, str) -> bool | The first parameter represents a valid tweet, and the second parameter represents a tweet word. This function should return True if and only if the tweet contains a hashtag made up of the hash symbol and the tweet word. For example (assuming the hash symbol is The '#' ), if the first argument is 'I like #csc108' , and the second argument is 'csc108' , then the function should return True . Notes: If the first argument is 'I like #csc108' , and the second argument is 'csc' , then the function should return False. Also, if the first argument is 'I like #csc108, #mat137, and #phl101' , and the second argument is csc108 , the function should return True . Hint: Use the helper function clean that is provided in the starter code. |
is_mentioned: (str, str) -> bool | The first parameter represents a valid tweet, and the second parameter represents a tweet word. This function should return True if and only if the tweet contains a mention made up of the mention symbol and the tweet word. For example (assuming the mention symbol is The '@' ), if the first argument is 'Go @Raptors! ' , and the second argument is 'Raptors' , then the function should return True . Hint: This function is very similar to the function contains_hashtag . What can you do to avoid writing the same code twice? |
add_mention_exclusive: (str, str) -> str | The first parameter represents a valid tweet and the second parameter represents a tweet word. Appending a space, a mention symbol, and the tweet word to the end of the original tweet will result in a potential tweet. If the potential tweet is valid, the original tweet contains the given tweet word, and the original tweet does not mention the given tweet word, the function should return the potential tweet. Otherwise, the function should return the original tweet. For example (assuming the mention symbol is The '@' ), if the first argument is 'Go Raptors! ' and the second argument is 'Raptors' , then the function should return 'Go Raptors! @Raptors' . If, on the other hand, the first argument is 'Go @Raptors! ' and the second argument is 'Raptors' , then the function should return the original tweet 'Go @Raptors! ' . Hint: can you use one of your other functions as a helper function? |
num_tweets_required: (str) -> int | The parameter represents a message. This function should return the minimum number of tweets that would be required to communicate the entire message. Recall the maximum length of a tweet is MAX_TWEET_LENGTH . Hint: the ceil function in the math module is useful here. |
get_nth_tweet: (str, int) -> str | The first parameter represents a message that a Twitter user would like to post, and the second parameter, n , represents an integer greater than or equal to 0. If the message contains too many characters, it would need to be split up into a sequence of tweets. All of the tweets in the sequence, except possibly the last tweet, would be of length MAX_TWEET_LENGTH . This function should return the nth valid tweet in the sequence of tweets. Note that the first tweet in the sequence has index 0 (the 0th tweet), the second tweet in the sequence has index 1, and so on. If the value of the second parameter is too large, so there is no index-n tweet in the sequence, this function should return an empty string. |
Using Constants
As we discuss in section Constants above, your code should make use of the provided constants. If the value of one of those constants were changed, and your program rerun, your functions should work with those new values.
For example, if MAX_TWEET_LENGTH
were changed, then your functions should work according to the new maximum tweet length.
Your docstring examples should reflect the given values of the constants in the provided starter code, and do not need to change.
No Input or Output
Your tweet.py
file should contain the starter code, plus the function definitions specified above. tweet.py
must not include any calls to the print
and input
functions. Do not add any import
statements. Also, do not include any function calls or other code outside of the function definitions.
How should you test whether your code works
First, run the checker and review ALL output you may need to scroll. You should also test each function individually by writing code to verify your functions in the Python shell. For example, after defining function compare_tweet_lengths, you might call it from the shell (e.g., compare_tweet_lengths(‘I love’, ‘programming’)) to check whether it returns the right value (-1). One call usually isn’t enough to thoroughly test the function for example, we should also test compare_tweet_lengths(‘programming’, ‘is fun’) where it should return 1 and compare_tweet_lengths(‘this course’, ‘is for me!! ‘) where it should return 0.
(This article is from CSProjectedu.com; reprint with source)