Plugin Storage

Uploading files

When a plugin has multiple stage, is deployable, or does a batching update of existing data, we often need to retrieve saved files or models from a previou stage.

On the initial plugin run, we have two keys related to upload and ownloading. Before we can upload a file, we first need to get the upload url. This can be done by doing a GET on getUploadUrls and within that use the value under stage as key to access the same key under getUploadUrls. Finally append to the get-upload-url the path of the filename you want to store the upload under, so you can download in any next stage using the same filename. In our example we will use model1.pkl.

Here is the relevant part of the plugin JSON manifest for the intial stage:

{
"stage": "initial",
"dataUrls": {
"initial": "https://data.stormly.com/api/plugin/query_dataset/abcdef123456"
},
"downloadUrls": {
"initial": "https://s3.eu-central-1.amazonaws.com/storage.stormly.com/abcd/1234"
},
"getUploadUrls": {
"initial": "https://www.stormly.com/api/developer/upload_url/abcd/1234"
},
...

To get the upload url for /model1.pkl in curl, we execute:

SIGNED_UPLOAD_URL=`curl "https://www.stormly.com/api/developer/upload_url/abcd/1234/model1.pkl"`

Next we actual upload the file using the upload url from the previous step:

curl --upload-file model1.pkl $SIGNED_UPLOAD_URL

Note that you can upload only into the current stage. So if current plugin run manifest is "stage": "initial", you can only upload using getUploadUrls for initial. So when you are in stage1, you cannot use the getUploadUrls for initial anymore. Downloading will work for any stage, from any stage.

In plugin development mode, these presigned upload urls are only valid for 8 hours, so make sure in general that you code GETs the upload url right before you want to do your upload. In production these presigned urls are valid for a few days.

Downloading files

To download a file from another stage, simply look the url up via the downloadUrls object, using the key of the stage you want to download from, and append the file path to the url.

For example in stage1 we may get JSON manifest below, and want to download model1.pkl that was uploaded in the initial stage:

{
"stage": "stage1",
"dataUrls": {
"initial": "https://data.stormly.com/api/plugin/query_dataset/abcdef123456",
"60secData": "https://data.stormly.com/api/plugin/query_dataset/ghijkl789012",
"latestData": "https://data.stormly.com/api/plugin/query_dataset/mnopqrst345678",
},
"downloadUrls": {
"initial": "https://s3.eu-central-1.amazonaws.com/storage.stormly.com/abcd/1234",
"stage1": "https://s3.eu-central-1.amazonaws.com/storage.stormly.com/abcd/5678"
},
"getUploadUrls": {
"initial": "https://www.stormly.com/api/developer/upload_url/abcd/1234",
"stage1": "https://www.stormly.com/api/developer/upload_url/abcd/5678"
},
...

To download model1.pkl we can do that with a GET, using curl as an example:

curl "https://s3.eu-central-1.amazonaws.com/storage.stormly.com/abcd/1234/model1.pkl"\
-o model1.pkl

It's important to note that any files should be written/downloaded to the current path, thus no absolute or other paths, because they will not exist or not be writeable when your plugin runs on the platform in production.

All downloadUrls from all stages are also available at the plugin deployment stage, if the plugin supports deployment. The deployment stage does not have access to any upload urls anymore, so uploading there is not possible, but also not needed in mostly.

Another thing to note is that the platform takes care of hyper-parameters variations and storage; each upload and download storage path can be fixed, when we request in the deployment or batching plugin stage the model1.pkl, it will be the one that was trained with the optimal hyper-parameters.

Limitations

  • Always download files to the current path, never use any absolute paths as they will not be writeable when running on production.
  • Uploading is done with a PUT request.
  • Upload urls are only valid for 5 days. After that, they don't accept uploads anymore.
  • 5GB per file is the maximum.
  • If you use curl, make sure to place the upload or download urls between quotes.