Skip to content

About

This is the website for ORIE 5270: Big Data Technologies and ORIE 6125: Computational Methods in Operations Research.

Summary

This course offers a broad overview of computational techniques and mathematical skills useful for data scientists. Topics include: UNIX shell, regular expressions, version control (git), data structures and algorithms, working with databases, data analysis using Python and related libraries (pandas, numpy / scipy, sklearn), parallel computing (Map-Reduce, Spark, Hadoop), an overview of standard machine learning and optimization algorithms, and time-permitting, a guided tour of functional programming.

Admin

Instructor: Vasilis Charisopoulos (vc333[at]cornell.edu) - OH: Wed 9-10pm, Fri 11am-12pm.

TAs:

  1. Sabhya Chhabria - OH: Tue 12-1pm
  2. Aahil Awatramani - OH: Tue 9-10pm

Lectures:

  1. ORIE 5270: Monday - Wednesday 10:10 - 11:00am US EST, on Zoom.
  2. ORIE 6125: Monday - Wednesday - Friday, 10:10 - 11:00am US EST, on Zoom.

Zoom links for lectures and office hours are available in Canvas. Lecture recordings are also available under Canvas > Panopto Recordings.

Campuswire

Except for live lectures (Zoom) and lecture recordings (Canvas), we will be using Campuswire for course announcements, lecture slides, and all other communication (instead of Piazza). Details for joining are available under the course Canvas (look under Modules > Campuswire). If you haven't been able to enroll yet but would like to access Campuswire, please send me an email (important: use your Cornell email when emailing me.)

Grade

ORIE 5270

The major portion of your grade will come from homework assignments (90%). All assignments are weighted equally.

The remaining 10% will be based off of participation (being active on the class forum, completing the course & TA evals, etc.). You can fulfill this requirement in multiple ways: for example, answering 3-4 questions on Campuswire through the course of the semester and completing the course evals will be enough to get full participation credit. Since you might be following the class asynchronously, attending lectures is not a requirement to obtain full participation points.

ORIE 6125

Your grade will be broken down as follows:

  • Homeworks (45%)
  • Final project (45%)
  • Participation (10%)

Homework

There will be a total of 7 homework assignments, released roughly every 2 weeks Tentatively, these will be released and due Fridays at 12pm (noon) US EST. We will use Gradescope for homework submissions.

If you submit the course evaluation at the end of the semester, you will be allowed to drop your lowest homework grade. You also get a total of 7 slip days throughout the semester that you can use to turn in assignments late. These are meant to help you in case of personal emergencies, travel, job interviews, or light sickness. You are responsible for keeping track of the number of slip days you have used up.

Note: if you fall ill due to COVID, you can reach out to SDS to request COVID-related accomodations. Extensions warranted by a documented COVID case (or any form of serious illness) will not "use up" any of your slip days.

Regrade requests

For each assignment, you will be to submit regrade requests (via Gradescope's interface) for up to a week after its grades are released.

Final project (ORIE 6125)

A major component of 6125 is a final project, where you will have a chance to combine tools and techniques we will go over throughout the semester.

Goals

The goal is for you to create (or extend) a project utilizing several concepts, tools and techniques covered in class. For example, your project could involve:

  • Version control (preferably git)
  • Unit testing for new features
  • Documentation for any public APIs, if applicable
  • HTML doc & visualization of the project
  • Attention to performance, where relevant (you don't need to extensively tune your project, but you should e.g. make informed decisions about the kind of data structures you use in core routines)

Your project can be related to your research, and need not be started from scratch. If there is a problem you have been working on and it needs a computational study, now might be the time to do it! Likewise, if you previously created a library or program to solve a certain problem, you can extend that project with new functionality, improved performance, updated documentation, and more complete unit tests.

Dates & Deliverables

There are three deliverables: a project proposal, a short writeup / report of what you did, and the source code and other components of the project itself.

Project proposal: by March 31st Friday, April 16th, you should have a project proposal hosted on a repository for your project (preferably at Cornell's Github) describing:

  • List of team members

  • Short motivation for the project

  • Description of expected deliverables (library / API / website or whatever else is applicable to your project)

  • Description of tools you expect to use

Please email me (vc333[at]cornell.edu) a link to your repository using the subject line "ORIE 6125 Project" by the aforementioned due date.

Final report: By a date TBA (most likely the end of exams), you should provide the following in your project's repository:

  • A document explaining the project (no more than 3-4 pages), including:

    • Motivation
    • Implementation details and challenges
    • Computational results, if any - timing tests, problems solved, etc.
    • Potential future directions
  • Documentation

    This will depend on the project and the language you chose to implement it in, but there should be a set of documentation that can be accessed separately from the source code. Ideally, this documentation should be automatically generated from the source code using some documentation tool (e.g. pydoc or Sphinx).

  • Source code

    You should include source files, test files, and a README file explaining how to use your software / get started.

  • (Optional) A brief description with links to the documentation (in HTML). This will be hosted on the web page for future courses to reference.

Past projects

Here are some example projects from past iterations of the course.

A Julia implementation of the MRG32k3a random number generator proposed by Pierre L'Ecuyer. Allows creation of multiple statistically independent random number streams and substreams.

Implementation of inverse Laplace transforms, Fast Fourier transforms and their inverses.

Implementation of a support vector machine that allows rejection of outliers.