Adding a Data Source

Once you’ve designed your data source, it’s time to add a data source.

Create Amorphous data definition file

First, create an Amorphous definition file that defines your data, so Amorphous knows what to do with it. Here’s an example amorphous.yaml file.

data_sources:
    - name: product_user_analytics
      columns:
        timestamp:
          - index
        user_id:
          - index
        feat1_usage:
          - independent
          - whatif
        feat2_usage:
          - independent
          - whatif
        random:
          - ignore
        device:
            - independent
        daily_consumption:
            - target

Each entry in the columns attribute must have one of the following values:

index, specifying that value is of type index. These values are used to manage the grain of the training and predictions.
independent, specifying independent variables.
target, specifying a target variable.

In addition, independent columns can have one additional attribute, the whatif attribute, which specifies that the column can be used in the whatif modeling. In general, you should use this attribute on “actionable” independent variables. For example, you may be able to increase button clicks on a specific button, but you might not be able to change the device type of a user.

The ignore attribute can be added to any columns you want the model to ignore.

Amorphous currently will automatically detect the following types in the independent variables:

integers
booleans
floating points
“categorical” strings, i.e., string values that represent categories

Notably, if you have an independent variable that is a string (e.g., the subject line of an email) that is not a category, you will have to ignore that variable. This limitation will be removed in a future version of Amorphous.

Add Data Source

Next, in the “Data” section of the left navigation, click on the “Add Data Source” button. Give your data source a descriptive name and upload both the configuration file you just created.

Amorphous supports three different types of data sources:

CSV files
BigQuery
PostgreSQL

Snowflake is under active development. To add a database connection, fill in the details for your selected database, and click on “Add Data Source”. Amorphous will attempt to connect to your database and perform a select from the table name supplied. If it cannot connect successfully, an error message will appear.

CSV

Upload a CSV file that corresponds to the data configuration file, and click on “Add Data Source”. The file will be uploaded and the data source will be created.

BigQuery

Amorphous uses a Google service account to connect to BigQuery. The role should have “BigQuery User” and “BigQuery Data Viewer” permissions. In the table name field, add the dataset and table name separated by a period, e.g., amorphous_app.product_churn_data.

PostgreSQL

For PostgreSQL, fill in the specific fields per below.

username and password correspond to the username and password needed to access the database. All passwords are securely encrypted prior to storage.
host corresponds to the public URL of the database.
- If you’re running a database locally (for testing purposes), you can access the database with host.docker.internal.
database is the name of the database.
table_name is the name of the table for the data source.

Click on “Add Data Source”.