ask485 Search Engine

Project Overview

Ask485 is a scalable search engine similar to Google or Bing. Given a CSV file containing documents and their respective titles and IDs, the hadoop MapReduce pipeline produces an inverted index of web pages. An inverted index is produced with each entry containing the term and respective inverse document frequency followed by a partition of three elements: a document ID in which the term appears, the frequency of the term in said document, and the normalization factor for the document. The index server is a REST API that returns search results in JSON format for the front-end to use. The server retrieves "hits" and ranks documents based on the weighted linear combination of two factors: the query-dependent tf-idf score, and the query-independent PageRank score. The weight of the PageRank score is a parameter specified by the user. The "search" server uses flask and jinja2 templates to display search results in a manner similar to mainstream search engines.

Date: Winter 2021

Project Role

I was primarily responsible for implementing the front-end "search" server, but I also worked alongside a group member to implement the "index" REST API server.

Relevant Technology

  • Python
  • sqlite3
  • Flask
  • Jinja2
  • HTML
  • CSS

Relevant Skills

  • REST APIs
  • PageRank
  • Information Retrieval
  • Dynamic server-side rendering
  • Relational database management
  • Full-stack development
  • Git collaboration and version control