url – https://en.wikipedia.org/wiki/Natural_language_processing

url – https://en.wikipedia.org/wiki/Natural_language_processing

Task 1: Text Summarization with Word Frequencies

1.1 Use the web scraping technique with BeautifulSoup as shown in class to get the text data from the specified data location on the Wikipedia webpage. Hints: Please see code snippets for web scraping in the lecture slides.

1.2 Preprocess the text data, including word tokenization, stop words and punctuation removal, etc.

1.3 Calculate word frequencies or weighted word frequencies. The NLTK FreqDist() function can be used to get original word frequencies.

1.4 Calculate the sentence scores by summing up the word (term) frequencies for each sentence after preprocessing. You can use different approaches.

1.5 Rank the sentences. Rank of the sentences based on the sentence scores.

1.6 Build a webpage summary based on the N top scoring sentences. Then create a new summary by restricting the vocabulary of considered tokens by either: 1) only including the K most frequent tokens within the document or 2) only including tokens that occur in at least K sentences.


to generate N-grams, using NLTK:

from nltk.util import ngrams

Def generate_ngrams(text,n)

N_grams =. Ngrams(nltk.word_tokenize(text.lower()), n)

Return[‘ ‘.join(grams) for grams in n_grams]

2.2 Write the code for text summarization with any N-grams. Note that we will check your program using at least two different n-grams, e.g., n=2, 3, or 4. Hints: a) (0.5 points) NLTK can be used to get N-grams and FreqDist() to calculate the n-gram frequencies.

2.3 Find weighted frequency occurrences. You can use the similar function from Task 1.

2.4 Define the function like calculate_sentence_scores_ngram(sent_tokens, ngram_freqs, n_grams) to calculate the sentence scores for any N-grams. This function is similar to the one in Task 1.

2.5 Generate new document summaries , similar to Task 1.6, using the new ngram scoring function with n=3 (i.e. trigrams). Similar to Task 1.6, experiment with using the full vocabulary and a restricted vocabulary

Complete Answer:

Get Instant Help in Homework Asap
Get Instant Help in Homework Asap
Calculate your paper price
Pages (550 words)
Approximate price: -
Open chat
Hello 👋
Thank you for choosing our assignment help service!
How can I help you?