Welcome to mu-pipelines!
This guide will help you set up and run Mu-Pipelines.
1️⃣ Install mu-pipelines
Ensure you have Python 3 installed, then run:
pip install 'mu-pipelines-driver[spark]'
2️⃣ Create Config Files
Our config files are divided into three sections: ingest
, transform
and destination
.
Configuration Parameters:
- Ingestxx: This is the source from where data needs to be ingested
- Transformxx: This is where you want to transform data
- Destinationxx: The destination where data should be stored.
Coming Soon
- Executexx: Run any python execute commands. e.g. ExecuteZIP, ExecuteFileCopy, ExecuteFileEncrypt etc
- Validation: Ability to run data quality checks based on parameters defined
- Alerts: Configure where you want to send alerts to
- Schedule: Add a schedule for your job
- Dependency: List of config's that the job is dependant on
Refer to the config documentation for a list of available execute
and destination
commands.
3️⃣ Run a Sample Config File
In your notebook or Python file, add the following code:
from mu_pipelines_driver.run_config import run_config
df = run_config(
[
{
"execution": [
{
"type": "CSVReadCommand",
"file_location": "/home/iceberg/data/file/people.csv",
"delimiter": ",",
"quotes": "\"",
"additional_attributes": [
{"key": "header", "value": "True"}
]
}
],
"destination": [
{
"type": "table",
"table_name": "crm.raw.people",
"mode": "overwrite"
}
]
}
],
{"library": "spark"},
{"connections": []}
)
4️⃣ Run Config Using Spark Submit
Run your configuration using Spark Submit.
You can run a config in production spark by using spark submit
Create a sample python script o import module
from mu_pipelines_driver.__main__ import main
main()
Spark submit command
spark-submit /home/scripts/app_args.py --global-properties /home/scripts/global-properties.json --connection-properties /home/scripts/connection-properties.json /home/scripts/raw/people/people.json
5️⃣ Run Config Using Airflow
Execute the config using Airflow for workflow orchestration. Use spark connector in airflow to create issue a spark submit command
6️⃣ Chain Different Configurations to Meet Business Needs
Mu-pipelines allows you to chain multiple configurations together to address complex business requirements.
🎯 Want to add a custom connector? Stay tuned for developer guides!
Need to connect with our team or have requests for specific connectors please email us at mupipelines@gmail.com
For more details please visit our website: https://mosaicsoft.wixsite.com/mu-pipelines