Adding a Data Source
Once you’ve designed your data source, it’s time to add a data source.
Create Amorphous data definition file
First, create an Amorphous definition file that defines your data, so Amorphous knows what to do with it. Here’s an example amorphous.yaml
file.
data_sources: - name: product_user_analytics columns: timestamp: - index user_id: - index feat1_usage: - independent - whatif feat2_usage: - independent - whatif random: - ignore device: - independent daily_consumption: - target
Each entry in the columns
attribute must have one of the following values:
index
, specifying that value is of typeindex
. These values are used to manage the grain of the training and predictions.independent
, specifying independent variables.target
, specifying a target variable.
In addition, independent
columns can have one additional attribute, the whatif
attribute, which specifies that the column can be used in the whatif modeling. In general, you should use this attribute on “actionable” independent variables. For example, you may be able to increase button clicks on a specific button, but you might not be able to change the device type of a user.
The ignore
attribute can be added to any columns you want the model to ignore.
Amorphous currently will automatically detect the following types in the independent variables:
- integers
- booleans
- floating points
- “categorical” strings, i.e., string values that represent categories
Notably, if you have an independent variable that is a string (e.g., the subject line of an email) that is not a category, you will have to ignore that variable. This limitation will be removed in a future version of Amorphous.
Add Data Source
Next, in the “Data” section of the left navigation, click on the “Add Data Source” button. Give your data source a descriptive name and upload both the configuration file you just created.
Amorphous supports three different types of data sources:
- CSV files
- BigQuery
- PostgreSQL
Snowflake is under active development. To add a database connection, fill in the details for your selected database, and click on “Add Data Source”. Amorphous will attempt to connect to your database and perform a select
from the table name supplied. If it cannot connect successfully, an error message will appear.
CSV
Upload a CSV file that corresponds to the data configuration file, and click on “Add Data Source”. The file will be uploaded and the data source will be created.
BigQuery
Amorphous uses a Google service account to connect to BigQuery. The role should have “BigQuery User” and “BigQuery Data Viewer” permissions. In the table name field, add the dataset and table name separated by a period, e.g., amorphous_app.product_churn_data
.
PostgreSQL
For PostgreSQL, fill in the specific fields per below.
username
andpassword
correspond to the username and password needed to access the database. All passwords are securely encrypted prior to storage.host
corresponds to the public URL of the database.- If you’re running a database locally (for testing purposes), you can access the database with
host.docker.internal
.
- If you’re running a database locally (for testing purposes), you can access the database with
database
is the name of the database.table_name
is the name of the table for the data source.
Click on “Add Data Source”.