Skip to content

DestinationParquet Command Documentation (Coming Soon)

The DestinationParquet command is used to specify the destination for writing data in parquet format. Parquet is a popular columnar storage format known for its efficient compression and fast read performance. This guide will walk you through configuring a Parquet destination for your data pipeline.

{
  "destination": [
    {
      "type": "destinationParquet",
      "file_location": "/home/data_lake/contacts/",
       "partition_columns": ["date", "region"],
       "compression": "snappy",
       "additional_attributes": [
          { "key": "parquet.encryption.column.keys", "value": "SSNKey:SSN" },
          { "key": "overwrite", "value": "true" }
      ]
    }
  ]
}

destinationParquet Command - Parameters

Parameters

Parameter Type Required Description
type String ✅ Yes The type of command (destinationParquet).
file_location String ✅ Yes The location of the output parquet files. This can be a relative or absolute path.
partition_columns String ❌ No Columns to partition the data by, improving query performance
compression String ❌ No Compression codec for the Parquet file (snappy, gzip, brotli, etc.). Default is snappy
additional_attributes Array ❌ No Additional attributes for customization (e.g., mode, append, etc.).

Detailed Explanation of Parameters

  • type: Always set to "destinationParquet". This indicates that the destination is a Parquet file.

  • file_location: The file path where the Parquet data will be written. This can be either an absolute or a relative file path. For example, /home/data_lake/contacts/. Files will be created at the given location if it does not already exist.

  • partition_columns: Columns to partition the data by, improving query performance.

  • compression: Compression codec for the Parquet file (snappy, gzip, brotli, etc.). Default is snappy.

The additional_attributes parameter allows users to specify extra properties for the Parquet write file:

  • key: The name of the attribute.
  • value: The value associated with the attribute.

Example of Header Attribute

"additional_attributes": [
  {
    "key": "overwrite",
    "value": "true"
  }
]

Conclusion

With Mu-Pipelines, writing to Parquet becomes effortless. Just configure your destination, and the framework handles the rest.

  • IngestParquet: Reads data from a Parquet file and uses it as input for your data pipeline. This command is helpful when you're looking to ingest data from Parquet files for further processing.