Python Projects

The mini-projects will all be due May 7th. Mini-Projects will be worth 66.6 points total (the value of the midterm exam) .
You will be given 4 Mini-Project Assignments (below), you HAVE TO stick to the instructions on the mini-project assignments, for example, you will be building PyGame Space Invaders with very specific instructions, and you cannot deviate from those instructions or ask to build another game:

Crawler and File Traverser
Web search engine interface
System Monitor with Web or Desktop UI
PyGame

You will have to commit to your choices by April 10th.
You can chose 1 mini-project yourself that is not listed above and should be very different from the projects listed above. If you chose to do that, you will have to provide the description by April 10th, and stick to that description. Your project will be graded based on the description you provide and if it does not match the description, it will be graded 0.
You need to submit total of 3 or 4 mini-projects. Each mini-project will have plenty of extra-credit work, so if you chose to submit 3, you will need to do some extra credit work worth at least 33% for each project.
All mini-projects installation and run instructions will need to be validated by another student in the class. The "validators" will receive 5% extra credit for each project they validate if I will not have issues following instructions when I try running the project.
We will continue to have quizzes, each worth 10 points, covering material of the previous lecture, they will be simpler after the [objects and exceptions] because the course content will get more complex and vague. Not all classes will have quizzes.
If I will find that you copied the code from internet for your projects (please trust me on this one - I am really good at recognizing if student did their own work and finding the copied content; changing function names, colors, variable names, printed text and comments won't help) you will receive 0 on the project. I will not forgive plagiarized submissions, hence if you receive 0 because you copied another code I won't listen to excuses - you get only one chance to submit, make it right the first time.

Mini Project 1 - Indexer, Crawler, File Traverser and Search.
File Traverser is optionalYou will be building a mini search engine.
You may be able to reuse the crawler program (Links to an external site.) and the keywords search that you build as one of your homework programs.

use http://newhaven.edu/search/Links to an external site. as inspiration and validation
The scope of this project is:

Your root directory must have README.md with instructions on how to install and run the project. You will need to have another student from the same class validate your instructions by running and installing your project and working with you to improve the instructions. The student who does the validation receives 5% extra credit if I am able to run the project without any issues. A single student can do validations for multiple other students.
Build a crawler that crawls all the pages in the newhaven.edu domain. There are about 26,100 pages. Use * search to see all pages on the http://newhaven.edu/search/Links to an external site.. If you want to research for existing python crawler frameworks and use those, that's ok! If you consider multiple ways to crawl a content and document the trade off analysis as part of your README.md, you will receive up to 20% extra credit. I will explain this in the class.
You may want to introspect the Google results behind the scenes to draw some decisions around what you should and should not extract from a page when crawling it. I will show in the class how to retrieve that information. Here is one example of the google document returned.

 {
      "cacheUrl": "http://www.google.com/search?q=cache:NK4cDcsUUYUJ:www.newhaven.edu",
      "clicktrackUrl": "https://www.google.com/url?client=internal-element-cse&cx=017404113844510084297:wxasmbgdgl0&q=https://www.newhaven.edu/&sa=U&ved=2ahUKEwjqu8j-0b7vAhWbZs0KHVyyC2MQFjAAegQIBRAB&usg=AOvVaw1T26xXIgOiASa7-BCy7YTm",
      "content": "The University of New Haven, founded on the Yale campus in 1920, founded on \nthe Yale campus in 1920, is a private, coeducational university situated on the&nbsp;...",
      "contentNoFormatting": "The University of New Haven, founded on the Yale campus in 1920, founded on \nthe Yale campus in 1920, is a private, coeducational university situated on the ...",
      "title": "University of New Haven: Home",
      "titleNoFormatting": "University of New Haven: Home",
      "formattedUrl": "https://www.newhaven.edu/",
      "unescapedUrl": "https://www.newhaven.edu/",
      "url": "https://www.newhaven.edu/",
      "visibleUrl": "www.newhaven.edu",
      "richSnippet": {
        "cseImage": {
          "src": "https://www.newhaven.edu/_resources/images/hero/charger-statue-snow-2021.jpg",
          "width": "150",
          "type": "0",
          "height": "80"
        },
        "metatags": {
          "twitterCard": "summary_large_image",
          "twitterSite": "@unewhaven",
          "twitterTitle": "Home - University of New Haven",
          "viewport": "width=device-width, initial-scale=1",
          "twitterDescription": "The University of New Haven, founded on the Yale campus in 1920, founded on the Yale campus in 1920, is a private, coeducational university situated on the coast of southern New England in West Haven, Connecticut. It’s a diverse and vibrant community of 7,000 students, with campuses across the country and around the world. Within our colleges and schools, students immerse themselves in a transformative, career-focused education across the liberal arts and sciences, fine arts, business, engineering, healthcare, public safety, and public service. We offer more than 100 academic programs, all grounded in a long-standing commitment to collaborative, interdisciplinary, project-based learning.",
          "twitterImage": "https://www.newhaven.edu/_resources/images/hero/charger-statue-snow-2021.jpg",
          "ogTitle": "Home - University of New Haven",
          "ogDescription": "The University of New Haven, founded on the Yale campus in 1920, founded on the Yale campus in 1920, is a private, coeducational university situated on the coast of southern New England in West Haven, Connecticut. It’s a diverse and vibrant community of 7,000 students, with campuses across the country and around the world. Within our colleges and schools, students immerse themselves in a transformative, career-focused education across the liberal arts and sciences, fine arts, business, engineering, healthcare, public safety, and public service. We offer more than 100 academic programs, all grounded in a long-standing commitment to collaborative, interdisciplinary, project-based learning.",
          "ogSiteName": "University of New Haven",
          "ogImage": "https://www.newhaven.edu/_resources/images/hero/charger-statue-snow-2021.jpg",
          "ogType": "website"
        }
      },
      "breadcrumbUrl": {
        "host": "www.newhaven.edu"
      }
    }

Store crawled content any way you like but do document in README.md what you used as storage and what is approximate byte store you will need to store ~26,100 pages and how you calculated. The part of the program that will store the data has to be in a separate class than the crawler, we will call the program that stores the content an indexer and the stored content - index. To store the data you can use:
- a noSQL datastore. I won't mind if you use a search engine as your data store! You may get lot of hidden benefits if you go that route. Look into https://lucene.apache.org/ (Links to an external site.) and some search engines built on top of it: https://www.elastic.co/elasticsearch/ (Links to an external site.), https://solr.apache.org/ (Links to an external site.). Those are the leaders in the industry. https://neo4j.com/ (Links to an external site.)is a Graph DB, it's not a search engine, but also utilizes indexing underneath and may add interesting analytical capabilities to your search.
- In-Memory datastore. Something like a data structure of custom objects or a dictionary will be sufficient. You may want to consider persisting your structure in a file (pickle maybe?) so that you don't need to re-crawl the content every time you shut down the program.
- SQL datastore.

If you consider and research on multiple data stores and document a trade off analysis in your readme, you will receive up to additional 20% credit.

Implement Search over the Content. Multiple Keywords must be covered, it is up to you how, use the UNH site search for inspiration.
Results must be returned in a JSON with the number of results and the time search took to retrieve the results.

File Traverser Optional. 50% Extra Credit
File traverser is similar to crawler, but uses a folder on your drive as a seed and traverses all files in that directory, handing them off to indexer.
URL to the file content is replaced with file URL scheme: https://en.wikipedia.org/wiki/File_URI_scheme
Since there is no file content on the UNH site, you can either add random files with text or maybe your python homeworks and programs. the file traversed content can be mixed with crawled content.

Mini Project 2 - Web Search Engine Interface

You will be building a search engine interface over the searchable content that you created in Mini-Project 1. Your interface should be similar to: newhaven.edu/search/

You should have following components on the page:

Search Text Box - where user will enter their keywords they want to search for
Search Button - the trigger for search
Number of results and Time it took to collect those results: About 28,100 results (0.24 seconds)
Pagination

Search API Requirements:

The URL with the search keywords should be bookmarkable. The search keywords can be passed via URL (not only in the text box on the page).Please see following examples
- newhaven.edu/search/?q=scientist
- http://newhaven.edu/search/?q=henry%20lee
The URL should also accept the number of results to display and the offset. Pagination should use this functionality to render the pages. (This does not work as I'd expect on the UNH website). &num_results=10 would return 10 results per page. &offset=20 would skip the first 20 results and start displaying results from the results #21. Please note that your results must be sorted some way for this functionality to work. As a result, if I try to bookmark search results from page 5, I should be returned to page 5 of the search results for the keywords I entered when I come back to that URL.
The API should return:
- Make sure the API doesn't break if no results are found or bad input was given
- Total number of results
- The time it took to fetch all the results
- Number of results returned
- Offset
- Results, containing at least URL and Title. Extra Credit: +15% if you highlight the keywords the user searched for in the results (this doesn't work correctly on the UNH search site, but if you try searching for something in Google, you will see the results will have your keywords highlighted)

Search Text Box Requirements:

Search should work if the user clicks on "Search" button or hits enter (works on UNH site)

Pagination:

Similar to the UNH Site, dynamically calculated based on the number of results and results per page settings.
The current selected page is highlighted and dynamically calculated based on the num_results and offset parameters

Extra Credits:

Please see 15% extra credit above for keywords highlighting
Type Ahead Functionality - Up to 33% extra credit: UNH Search site has this working correctly, as the user types, there is a box drop-down of search suggestions. In order to receive a full credit, please describe in the README.md how you implemented this functionality, other mechanisms you considered and why you chose this implementation. Also please describe what are the potential problems if you truly host your search on the internet and thousands of users will be making requests to type ahead. Use of 3rd party frameworks that you downloaded/installed and incorporated is acceptable. Calling another API over HTTP that you did not build yourself is NOT acceptable.
Any sort of lemmatization/synonym expansion - Up to 33% extra credit. You may already have some research done for this in one of the home works. You will receive a full credit only if you build a scalable solution, i.e. right now it may only work for 10 or so words and their expansions, but the solution should still work if we needed to cover entire english dictionary. Document your solution in README.md. Use of 3rd party frameworks that you downloaded/installed and incorporated is acceptable.

Mini Project 3 - Space Invaders

You will be building a "Space Invaders" Pygame. There is a basic tutorial you can use as a boiler plate at the URL below:
itnext.io/creating-space-invaders-clone-in-pygame-ea0f5336c677

Requirements:

Game components:

Multiple levels in the game (at least 2) and the level should be displayed on the screen
Multiple lives for the user displayed on the screen
The user joystick displayed on the bottom of the screen and can be moved with key controls
Obstacles (pink squares) between the user joystick and the enemies
Enemies array

Levels: The user levels up when score reaches a certain pre-set number, for example 100. At each next level there is one or more of the following complexities:

Enemies move faster
There are more enemies
Enemies shoot back faster
There are more obstacles

Extra credit +33% for flexible, auto-generated levels that easily scale to 10 levels. explain in the README.md how this was achieved to receive full credit

Enemies:

Must move from left to right synchronously
Must have different images for each row
Number of rows is flexible
Should not go out of the screen boundaries

User Joystick:

Must be able to move from left to right and right to left using the key controls
Must be able to shoot and destroy enemies (Extra Credit: 33% for different shooting abilities for the user that produce different visual effect, for example in addition to shooting bullets, user can press down a special key and shoot a bomb killing more enemies at once)

Obstacles (pink boxes):

Can either widen or grow in number at each level
User or Enemies cannot shoot through obstacles

Mini Project 4 - System Monitor

You will be building a system monitor similar to the bottom video here: www.nurmatova.com/lecture-12---websockets-system-monitoring-pygame.html

Monitor can be a

Web Application (like at the URL above), suggested
Desktop UI application, utilizing tools we will cover here: https://www.nurmatova.com/section-11---django-gui.html

If you chose to go with a Web Application:

You will have to build your back-end API using python framework, any of the python frameworks are accepted (ex: Flask, Django).
Data communication protocol can be REST or gRPC
Extra credit: 33% for wrapping either back end API of front-end in the docker container

Monitor should run on your laptop and monitor following metrics:

Disk Utilization
Network Traffic
CPU utilization
Memory Utilization
Load Averages
Internet signal strength
Extra credit: 10% for each additional meaningful metric displayed in a meaningful chart

Extra Credit:

Scalable, flexible threshold definition and alerting mechanism (we will discuss this in the class) + 20%
Visualizing thresholds and breaches +15%