Hamilton Dataflows

Welcome!

Here you'll find a website that curates a collection of Hamilton Dataflows that are ready to be used in your own projects. They are user-contributed and maintained, with the goal of making it easier for you to get started with Hamilton.

We expect this collection to grow over time, so check back often! As dataflows become mature we will move them into the official DAGWorks sub-package of this site and become maintained by DAGWorks's Hamilton team.

☝️ Use the search bar above to quickly find dataflows based on keyword search.

👈 On the left hand can find dataflows organized by User, DAGWorks, and tags.

Usage

There are two methods to get access to dataflows presented here.

Assumptions:

You are familiar with Hamilton and have it installed. If not, take 15 minutes to learn Hamilton in your browser and then pip install sf-hamilton to get started. Come back here when you're ready to use Hamilton.
You have the requisite python dependencies installed on your system. You'll get import errors if you don't. Don't know what you need? Scroll to the bottom of a dataflow to find the requirements. We're working on convenience functions to help!

For more extensive documentation, please see Hamilton User Contrib documentation.

Dynamic installation

Here we dynamically download the dataflow from the internet and execute it. This is useful for quickly iterating in a notebook and pulling in just the dataflow you need.

from hamilton import dataflow, driver

# download into ~/.hamilton/dataflows and load the module -- WARNING: ensure you know what code you're importing!
# NAME_OF_DATAFLOW = dataflow.import_module("NAME_OF_DATAFLOW") # if official dataflow
NAME_OF_DATAFLOW = dataflow.import_module("NAME_OF_DATAFLOW", "NAME_OF_USER")
dr = (
  driver.Builder()
  .with_config({})  # replace with configuration as appropriate
  .with_modules(NAME_OF_DATAFLOW)
  .build()
)
# execute the dataflow, specifying what you want back. Will return a dictionary.
result = dr.execute(
  [NAME_OF_DATAFLOW.FUNCTION_NAME, ...],  # this specifies what you want back
  inputs={...}  # pass in inputs as appropriate
)

See this notebook for an example.

Static installation

This approach relies on you installing the package on your system. This is the recommended path for production purposes as you can version-lock your dependencies.

To install the package, run:

pip install sf-hamilton-contrib --upgrade

Once installed, you can import the dataflows as follows.

Things you need to know:

Whether it's a user or official dataflow. If user, what the name of the user is.
The name of the dataflow.

from hamilton import driver
# from hamilton.contrib.dagworks import NAME_OF_DATAFLOW
from hamilton.contrib.user.NAME_OF_USER import NAME_OF_DATAFLOW

dr = (
    driver.Builder()
    .with_config({})  # replace with configuration as appropriate
    .with_modules(NAME_OF_DATAFLOW)
    .build()
)
# execute the dataflow, specifying what you want back. Will return a dictionary.
result = dr.execute(
    [NAME_OF_DATAFLOW.FUNCTION_NAME, ...],  # this specifies what you want back
    inputs={...}  # pass in inputs as appropriate
)

Modification

Getting started is one thing, but then modifying to your needs is another. So we have a prescribed flow to enable you to take a dataflow, and copy the code to a place of your choosing. This allows you to easily modify the dataflow as you see fit.

Run this in a notebook or python script to copy the dataflow to a directory of your choosing.

from hamilton import dataflows

# dynamically pull and then copy
NAME_OF_DATAFLOW = dataflows.import_module("NAME_OF_DATAFLOW", "NAME_OF_USER")
dataflows.copy(NAME_OF_DATAFLOW, destination_path="PATH_TO_DIRECTORY")
# copy from the installed library
from hamilton.contrib.user.NAME_OF_USER import NAME_OF_DATAFLOW
dataflows.copy(NAME_OF_DATAFLOW, destination_path="PATH_TO_DIRECTORY")

You can then modify/import the code as you see fit. See copy() for more details.

How to contribute

If you have a dataflow that you would like to share with the community, please submit a pull request to this repository. We will review your dataflow and if it meets our standards we will add it to the package. To submit a pull request please use this link as it'll take you to the specific PR template.

Dataflow standards

We want to ensure that the dataflows in this package are of high quality and are easy to use. To that end, we have a set of standards that we expect all dataflows to meet. If you have any questions, please reach out.

Standards:

The dataflow must be a valid Python module.
It must not do anything malicious.
It must be well documented.
It must work.
It must follow our standard structure as outlined below.

Checklist for new dataflows:

Do you have the following?

Got questions?

Join our slack community to chat/ask Qs/etc.

Navigation
Usage
How to contribute
- Dataflow standards
- Checklist for new dataflows:

Hamilton Dataflows

Welcome!

Navigation​

Usage​

Dynamic installation​

Static installation​

Modification​

How to contribute​

Dataflow standards​

Checklist for new dataflows:​