The mini-projects will all be due May 7th. Mini-Projects will be worth 66.6 points total (the value of the midterm exam) .
You will be given 4 Mini-Project Assignments (below), you HAVE TO stick to the instructions on the mini-project assignments, for example, you will be building PyGame Space Invaders with very specific instructions, and you cannot deviate from those instructions or ask to build another game:
You can chose 1 mini-project yourself that is not listed above and should be very different from the projects listed above. If you chose to do that, you will have to provide the description by April 10th, and stick to that description. Your project will be graded based on the description you provide and if it does not match the description, it will be graded 0.
You need to submit total of 3 or 4 mini-projects. Each mini-project will have plenty of extra-credit work, so if you chose to submit 3, you will need to do some extra credit work worth at least 33% for each project.
All mini-projects installation and run instructions will need to be validated by another student in the class. The "validators" will receive 5% extra credit for each project they validate if I will not have issues following instructions when I try running the project.
We will continue to have quizzes, each worth 10 points, covering material of the previous lecture, they will be simpler after the [objects and exceptions] because the course content will get more complex and vague. Not all classes will have quizzes.
If I will find that you copied the code from internet for your projects (please trust me on this one - I am really good at recognizing if student did their own work and finding the copied content; changing function names, colors, variable names, printed text and comments won't help) you will receive 0 on the project. I will not forgive plagiarized submissions, hence if you receive 0 because you copied another code I won't listen to excuses - you get only one chance to submit, make it right the first time.
You will be given 4 Mini-Project Assignments (below), you HAVE TO stick to the instructions on the mini-project assignments, for example, you will be building PyGame Space Invaders with very specific instructions, and you cannot deviate from those instructions or ask to build another game:
- Crawler and File Traverser
- Web search engine interface
- System Monitor with Web or Desktop UI
- PyGame
You can chose 1 mini-project yourself that is not listed above and should be very different from the projects listed above. If you chose to do that, you will have to provide the description by April 10th, and stick to that description. Your project will be graded based on the description you provide and if it does not match the description, it will be graded 0.
You need to submit total of 3 or 4 mini-projects. Each mini-project will have plenty of extra-credit work, so if you chose to submit 3, you will need to do some extra credit work worth at least 33% for each project.
All mini-projects installation and run instructions will need to be validated by another student in the class. The "validators" will receive 5% extra credit for each project they validate if I will not have issues following instructions when I try running the project.
We will continue to have quizzes, each worth 10 points, covering material of the previous lecture, they will be simpler after the [objects and exceptions] because the course content will get more complex and vague. Not all classes will have quizzes.
If I will find that you copied the code from internet for your projects (please trust me on this one - I am really good at recognizing if student did their own work and finding the copied content; changing function names, colors, variable names, printed text and comments won't help) you will receive 0 on the project. I will not forgive plagiarized submissions, hence if you receive 0 because you copied another code I won't listen to excuses - you get only one chance to submit, make it right the first time.
Mini Project 1 - Indexer, Crawler, File Traverser and Search.
File Traverser is optionalYou will be building a mini search engine.
You may be able to reuse the crawler program (Links to an external site.) and the keywords search that you build as one of your homework programs.
use http://newhaven.edu/search/Links to an external site. as inspiration and validation
The scope of this project is:
File Traverser is optionalYou will be building a mini search engine.
You may be able to reuse the crawler program (Links to an external site.) and the keywords search that you build as one of your homework programs.
use http://newhaven.edu/search/Links to an external site. as inspiration and validation
The scope of this project is:
- Your root directory must have README.md with instructions on how to install and run the project. You will need to have another student from the same class validate your instructions by running and installing your project and working with you to improve the instructions. The student who does the validation receives 5% extra credit if I am able to run the project without any issues. A single student can do validations for multiple other students.
- Build a crawler that crawls all the pages in the newhaven.edu domain. There are about 26,100 pages. Use * search to see all pages on the http://newhaven.edu/search/Links to an external site.. If you want to research for existing python crawler frameworks and use those, that's ok! If you consider multiple ways to crawl a content and document the trade off analysis as part of your README.md, you will receive up to 20% extra credit. I will explain this in the class.
- You may want to introspect the Google results behind the scenes to draw some decisions around what you should and should not extract from a page when crawling it. I will show in the class how to retrieve that information. Here is one example of the google document returned.
{ "cacheUrl": "http://www.google.com/search?q=cache:NK4cDcsUUYUJ:www.newhaven.edu", "clicktrackUrl": "https://www.google.com/url?client=internal-element-cse&cx=017404113844510084297:wxasmbgdgl0&q=https://www.newhaven.edu/&sa=U&ved=2ahUKEwjqu8j-0b7vAhWbZs0KHVyyC2MQFjAAegQIBRAB&usg=AOvVaw1T26xXIgOiASa7-BCy7YTm", "content": "The University of New Haven, founded on the Yale campus in 1920, founded on \nthe Yale campus in 1920, is a private, coeducational university situated on the ...", "contentNoFormatting": "The University of New Haven, founded on the Yale campus in 1920, founded on \nthe Yale campus in 1920, is a private, coeducational university situated on the ...", "title": "University of New Haven: Home", "titleNoFormatting": "University of New Haven: Home", "formattedUrl": "https://www.newhaven.edu/", "unescapedUrl": "https://www.newhaven.edu/", "url": "https://www.newhaven.edu/", "visibleUrl": "www.newhaven.edu", "richSnippet": { "cseImage": { "src": "https://www.newhaven.edu/_resources/images/hero/charger-statue-snow-2021.jpg", "width": "150", "type": "0", "height": "80" }, "metatags": { "twitterCard": "summary_large_image", "twitterSite": "@unewhaven", "twitterTitle": "Home - University of New Haven", "viewport": "width=device-width, initial-scale=1", "twitterDescription": "The University of New Haven, founded on the Yale campus in 1920, founded on the Yale campus in 1920, is a private, coeducational university situated on the coast of southern New England in West Haven, Connecticut. It’s a diverse and vibrant community of 7,000 students, with campuses across the country and around the world. Within our colleges and schools, students immerse themselves in a transformative, career-focused education across the liberal arts and sciences, fine arts, business, engineering, healthcare, public safety, and public service. We offer more than 100 academic programs, all grounded in a long-standing commitment to collaborative, interdisciplinary, project-based learning.", "twitterImage": "https://www.newhaven.edu/_resources/images/hero/charger-statue-snow-2021.jpg", "ogTitle": "Home - University of New Haven", "ogDescription": "The University of New Haven, founded on the Yale campus in 1920, founded on the Yale campus in 1920, is a private, coeducational university situated on the coast of southern New England in West Haven, Connecticut. It’s a diverse and vibrant community of 7,000 students, with campuses across the country and around the world. Within our colleges and schools, students immerse themselves in a transformative, career-focused education across the liberal arts and sciences, fine arts, business, engineering, healthcare, public safety, and public service. We offer more than 100 academic programs, all grounded in a long-standing commitment to collaborative, interdisciplinary, project-based learning.", "ogSiteName": "University of New Haven", "ogImage": "https://www.newhaven.edu/_resources/images/hero/charger-statue-snow-2021.jpg", "ogType": "website" } }, "breadcrumbUrl": { "host": "www.newhaven.edu" } }
File Traverser Optional. 50% Extra Credit
File traverser is similar to crawler, but uses a folder on your drive as a seed and traverses all files in that directory, handing them off to indexer.
URL to the file content is replaced with file URL scheme: https://en.wikipedia.org/wiki/File_URI_scheme
Since there is no file content on the UNH site, you can either add random files with text or maybe your python homeworks and programs. the file traversed content can be mixed with crawled content.
- Store crawled content any way you like but do document in README.md what you used as storage and what is approximate byte store you will need to store ~26,100 pages and how you calculated. The part of the program that will store the data has to be in a separate class than the crawler, we will call the program that stores the content an indexer and the stored content - index. To store the data you can use:
- a noSQL datastore. I won't mind if you use a search engine as your data store! You may get lot of hidden benefits if you go that route. Look into https://lucene.apache.org/ (Links to an external site.) and some search engines built on top of it: https://www.elastic.co/elasticsearch/ (Links to an external site.), https://solr.apache.org/ (Links to an external site.). Those are the leaders in the industry. https://neo4j.com/ (Links to an external site.)is a Graph DB, it's not a search engine, but also utilizes indexing underneath and may add interesting analytical capabilities to your search.
- In-Memory datastore. Something like a data structure of custom objects or a dictionary will be sufficient. You may want to consider persisting your structure in a file (pickle maybe?) so that you don't need to re-crawl the content every time you shut down the program.
- SQL datastore.
- Implement Search over the Content. Multiple Keywords must be covered, it is up to you how, use the UNH site search for inspiration.
- Results must be returned in a JSON with the number of results and the time search took to retrieve the results.
File Traverser Optional. 50% Extra Credit
File traverser is similar to crawler, but uses a folder on your drive as a seed and traverses all files in that directory, handing them off to indexer.
URL to the file content is replaced with file URL scheme: https://en.wikipedia.org/wiki/File_URI_scheme
Since there is no file content on the UNH site, you can either add random files with text or maybe your python homeworks and programs. the file traversed content can be mixed with crawled content.
Mini Project 2 - Web Search Engine Interface
You will be building a search engine interface over the searchable content that you created in Mini-Project 1. Your interface should be similar to: newhaven.edu/search/
You should have following components on the page:
Search API Requirements:
Search Text Box Requirements:
Pagination:
Extra Credits:
You will be building a search engine interface over the searchable content that you created in Mini-Project 1. Your interface should be similar to: newhaven.edu/search/
You should have following components on the page:
- Search Text Box - where user will enter their keywords they want to search for
- Search Button - the trigger for search
- Number of results and Time it took to collect those results: About 28,100 results (0.24 seconds)
- Pagination
Search API Requirements:
- The URL with the search keywords should be bookmarkable. The search keywords can be passed via URL (not only in the text box on the page).Please see following examples
- The URL should also accept the number of results to display and the offset. Pagination should use this functionality to render the pages. (This does not work as I'd expect on the UNH website). &num_results=10 would return 10 results per page. &offset=20 would skip the first 20 results and start displaying results from the results #21. Please note that your results must be sorted some way for this functionality to work. As a result, if I try to bookmark search results from page 5, I should be returned to page 5 of the search results for the keywords I entered when I come back to that URL.
- The API should return:
- Make sure the API doesn't break if no results are found or bad input was given
- Total number of results
- The time it took to fetch all the results
- Number of results returned
- Offset
- Results, containing at least URL and Title. Extra Credit: +15% if you highlight the keywords the user searched for in the results (this doesn't work correctly on the UNH search site, but if you try searching for something in Google, you will see the results will have your keywords highlighted)
Search Text Box Requirements:
- Search should work if the user clicks on "Search" button or hits enter (works on UNH site)
Pagination:
- Similar to the UNH Site, dynamically calculated based on the number of results and results per page settings.
- The current selected page is highlighted and dynamically calculated based on the num_results and offset parameters
Extra Credits:
- Please see 15% extra credit above for keywords highlighting
- Type Ahead Functionality - Up to 33% extra credit: UNH Search site has this working correctly, as the user types, there is a box drop-down of search suggestions. In order to receive a full credit, please describe in the README.md how you implemented this functionality, other mechanisms you considered and why you chose this implementation. Also please describe what are the potential problems if you truly host your search on the internet and thousands of users will be making requests to type ahead. Use of 3rd party frameworks that you downloaded/installed and incorporated is acceptable. Calling another API over HTTP that you did not build yourself is NOT acceptable.
- Any sort of lemmatization/synonym expansion - Up to 33% extra credit. You may already have some research done for this in one of the home works. You will receive a full credit only if you build a scalable solution, i.e. right now it may only work for 10 or so words and their expansions, but the solution should still work if we needed to cover entire english dictionary. Document your solution in README.md. Use of 3rd party frameworks that you downloaded/installed and incorporated is acceptable.
Mini Project 3 - Space Invaders
You will be building a "Space Invaders" Pygame. There is a basic tutorial you can use as a boiler plate at the URL below:
itnext.io/creating-space-invaders-clone-in-pygame-ea0f5336c677
Requirements:
Game components:
Levels: The user levels up when score reaches a certain pre-set number, for example 100. At each next level there is one or more of the following complexities:
Enemies:
User Joystick:
Obstacles (pink boxes):
You will be building a "Space Invaders" Pygame. There is a basic tutorial you can use as a boiler plate at the URL below:
itnext.io/creating-space-invaders-clone-in-pygame-ea0f5336c677
Requirements:
Game components:
- Multiple levels in the game (at least 2) and the level should be displayed on the screen
- Multiple lives for the user displayed on the screen
- The user joystick displayed on the bottom of the screen and can be moved with key controls
- Obstacles (pink squares) between the user joystick and the enemies
- Enemies array
Levels: The user levels up when score reaches a certain pre-set number, for example 100. At each next level there is one or more of the following complexities:
- Enemies move faster
- There are more enemies
- Enemies shoot back faster
- There are more obstacles
Enemies:
- Must move from left to right synchronously
- Must have different images for each row
- Number of rows is flexible
- Should not go out of the screen boundaries
User Joystick:
- Must be able to move from left to right and right to left using the key controls
- Must be able to shoot and destroy enemies (Extra Credit: 33% for different shooting abilities for the user that produce different visual effect, for example in addition to shooting bullets, user can press down a special key and shoot a bomb killing more enemies at once)
Obstacles (pink boxes):
- Can either widen or grow in number at each level
- User or Enemies cannot shoot through obstacles
Mini Project 4 - System Monitor
You will be building a system monitor similar to the bottom video here: www.nurmatova.com/lecture-12---websockets-system-monitoring-pygame.html
Monitor can be a
If you chose to go with a Web Application:
Monitor should run on your laptop and monitor following metrics:
Extra Credit:
You will be building a system monitor similar to the bottom video here: www.nurmatova.com/lecture-12---websockets-system-monitoring-pygame.html
Monitor can be a
- Web Application (like at the URL above), suggested
- Desktop UI application, utilizing tools we will cover here: https://www.nurmatova.com/section-11---django-gui.html
If you chose to go with a Web Application:
- You will have to build your back-end API using python framework, any of the python frameworks are accepted (ex: Flask, Django).
- Data communication protocol can be REST or gRPC
- Extra credit: 33% for wrapping either back end API of front-end in the docker container
Monitor should run on your laptop and monitor following metrics:
- Disk Utilization
- Network Traffic
- CPU utilization
- Memory Utilization
- Load Averages
- Internet signal strength
- Extra credit: 10% for each additional meaningful metric displayed in a meaningful chart
Extra Credit:
- Scalable, flexible threshold definition and alerting mechanism (we will discuss this in the class) + 20%
- Visualizing thresholds and breaches +15%