Episodes
This is the podcast for the thirteenth class, in which we look at a couple of ways to organize and present information to the user. We see how a term-cloud interface can be created, allowing the user to get a quick glance at the underlying collection. We also talk about a number of clustering algorithms and see how they can be implemented with Lemur.
Published 11/17/09
This is the podcast for the twelfth class, in which we see how REST requests can be made through the web, and the responses in XML can be parsed. This allows us to start connecting with Web 2.0 sources that provide the functionality of meshing different sources by open data exchange.
Published 11/10/09
This is the podcast for the eleventh class, in which we see traditional and non-traditional methods of collecting data off the web. Traditional way is demonstrated using web crawling using wget, and non-traditional way is instantiated with YouTube harvesting.
Published 11/03/09
This is the podcast for the tenth class, in which we connect the back-end for search that we have been working with to a web-based front-end. This is done using Indri, a new search engine component for Lemur. We also explore some details of AJAX and see how we could use it to enhance our user interface for search.
Published 10/27/09
This is the podcast for the ninth class, in which we continue looking at evaluation. We talk about more measures to evaluate a query and a system. We also look at comparing two rank lists.
Published 10/13/09
This is the podcast for the eighth class, in which we start looking at one of the core components of IR - evaluation. We begin our discussion by revisiting recall and precision, and then continue exploring R-precision, AP, and MAP. We see how these can be measured manually and then using TREC supplied tools.
Published 10/06/09
This is the podcast for the seventh class, in which we look at how structured queries with term weights can be executed using Lemur. We use this to provide terminologic feedback to the user, and incorporate the relevance feedback that the user provides into retrieval process.
Published 10/06/09
This is the podcast for the sixth class, in which we continue exploring models for retrieving information. So far we have seen vector space, boolean, and language models. Now we talk about probabilistic and relevance models.
Published 09/29/09
This is the podcast for the fifth class, in which we dive into models for retrieving information. We had already looked at the vector space model. Now we talk about boolean retrieval and language models for retrieval.
Published 09/22/09
This is the podcast for the fourth class, in which we look at how queries can be processed using Lemur Toolkit. We then match the processed queries with collection index to retrieve a rank list.
Published 09/15/09
This is the podcast for the third class, in which we go through indexing process by first doing tokenization, stop words removal, and stemming manually, and then with the help of Lemur Toolkit.
Published 09/08/09
This is the podcast for the second class, in which we continue working with MySQL and look at full-text search in database, create a basic UI for database access, and start looking at IR from unstructured data sources.
Published 09/01/09
This is the podcast for the first class, in which the course and its structure is introduced. MySQL is introduced with some working examples.
Published 08/28/09
This is a pre-course podcast welcoming the students to the course and providing some information about how this course will work.
Published 08/16/09