Abstract:
A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and this is a substantial source of revenue supporting the web today. Despite the importance of this area, little formal, published research exists. We describe a system that learns how to extract keywords from web pages for advertisement targeting. The system uses a number of weather keywords presences in meta-data and how often the term occurs in search query. This system is trained with a set of example pages that have been hand-labeled with "relevant" keywords. Based on this training, it can extract new keywords from previously unseen pages.