DIS 2006/2007

Exercise 8: TF/IDF ranking

In this exercise we'll have a look at how the TF/IDF ranking works.

There are 5 different documents in the collection:

D1 = "If it walks like a duck and quacks like a duck, it must be a duck."
D2 = "Beijing Duck is mostly prized for the thin, crispy duck skin with authentic versions of the dish serving mostly the skin."
D3 = "Bugs' ascension to stardom also prompted the Warner animators to recast Daffy Duck as the rabbit's rival, intensely jealous and determined to steal back the spotlight while Bugs remained indifferent to the duck's jealousy, or used it to his advantage. This turned out to be the recipe for the success of the duo."
D4 = "6:25 PM 1/7/2007 blog entry: I found this great recipe for Rabbit Braised in Wine on cookingforengineers.com."
D5 = "Last week Li has shown you how to make the Sechuan duck. Today we'll be making Chinese dumplings (Jiaozi), a popular dish that I had a chance to try last summer in Beijing. There are many recipies for Jiaozi."

Task 1. For the query Q = "Beijing duck recipe", find the two top ranked documents according to the TF/IDF rank. Assume the cosine similarity measure and the culinary term set T = {beijing, dish,duck, rabbit, recipe, roast}. Are the top ranked documents relevant to the query?

Task 2. Assume that the author of the document D5 goes on to tell more about her summer trip to China before doing the cooking and uses the word Beijing 3 times, instead of just once. What happens to the rank of D5? How can this be interpreted in the vector retrieval model (vectors and angles between them)? Is this change in the ranking of D5 a desirable property of TF/IDF? Why?

Solution

Excel sheet with calculations

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

Exercise 8: TF/IDF ranking

Exercise 8: TF/IDF ranking

评论(0)

提示：请文明发言取消回复

作者信息

最近更新

2022年最新BellooV4.3.2(贝罗交友约会)php破解源码-完整的高级约会源码

wordpress 阅读量每次访问随机增加的插件

原创 WordPress 主题 C7V5 扁平化|响应式|HTML5主题

CSS 选择器权重特性

宝塔Linux面板一键挂载云服务器硬盘

Linux 系统介绍以及常用命令（零基础级别）

2022年最新wordpress日主题Ripro子主题-ziyuan-zhankr蓝色资源网主题V3.0.3子主题破解版

HTML head 头标签详细解答

Exercise 8: TF/IDF ranking

Exercise 8: TF/IDF ranking

评论(0)

提示：请文明发言 取消回复

相关文章

作者信息

最近更新

提示：请文明发言取消回复