Aller au contenu principal

Getting Started with "Datapipeline"

Introduction

This quick start guide is designed to help you quickly get up to speed with using the pipeline editor and deploying your data processing tasks efficiently.

Prerequisites

Before you begin, make sure you have the following:

  • A user account on the platform.
  • Access to the Datapipeline application.
  • Data sources ready to be processed (you can upload files before, or connect to data sources via the interface).

To get started, log in to the Datapipeline application using your credentials. Once logged in, you'll be directed to the main interface where you can create and manage your data processing pipelines.

Creating a Pipeline

Access the Pipeline Editor

In the main menu, click on "Pipelines", then click "Create a Pipeline". This will take you to the visual editor where you can design and manage your data processing pipeline.

Add Processing Blocks

In the pipeline editor, you can add processing blocks (also known as "nodes"). These blocks represent specific steps in your data processing workflow, such as:

  • Data extraction (e.g., from a database or CSV file in S3)
  • Data transformation (cleaning, normalization, aggregation)
  • Data loading (e.g., into an other database or files storage)

To add a block, click the "Add Block" button and choose the type of processing step you want to configure. Each block can be configured individually through an intuitive interface.

Configure the Blocks

Each block has a set of parameters that you can configure based on your requirements. For example:

  • For a data extraction block, you'll need to provide access details to the source (URL, credentials, etc..).
  • For a transformation block, you can set rules to clean or manipulate the data.

Linking the Blocks

Once the blocks are added and configured, it's time to link them to define the order of execution for the pipeline. To do this, click on a block and drag an arrow to the next block in the processing flow.

Example Simple Flow:

  1. Extraction → 2. Transformation → 3. Loading

The arrows indicate the execution order of the blocks. This flow will be executed automatically when you deploy the pipeline.

Testing the Pipeline

Before deploying your pipeline, it’s a good idea to test it to ensure it works as expected. To test the pipeline:

  • Click on the "Debug" button.
  • Review the logs and test results to verify that the data is processed correctly.

If any errors are detected, the editor will highlight the affected blocks and provide information to help you troubleshoot.

Deploying the Pipeline

Once your pipeline is configured and tested, you're ready to deploy it. Deploying means that your data processing task will be executed automatically according to the schedule or events and configuration you set.

To deploy the pipeline:

  1. Click the "Deploy" button from the main menu.
  2. Confirm the deployment.
  3. Monitor the pipeline execution in real-time through the monitoring interface.

Monitoring and Managing Deployed Pipelines

After deploying your pipeline, you can monitor it through the dashboard. The interface allows you to track the execution status, view detailed logs, and see any errors or alerts.

Possible Actions:

  • Pause: If you need to interrupt the pipeline execution to make modifications, click "Pause".
  • Redeploy: If you make changes to the pipeline after deployment, you can redeploy the updated version by clicking "Redeploy".
  • Delete: To delete a pipeline, click "Delete" in the options menu.

Conclusion

Congratulations, you now have the essentials to create, test, deploy, and manage your data processing pipelines with Datapipeline. This guide covers the key steps to get started, but the platform offers many other advanced features to optimize and customize your workflows.

If you have any questions or need further assistance, feel free to check our "Troubleshooting" section or reach out to our team.