WriteParquetFile

WriteParquetFile writes data rows to an Apache Parquet file.

The function accepts any Linx data source (a database result set or any list) and writes a Parquet file. The output schema is inferred from the data at runtime. No schema file is required.


Properties

File path

The absolute or UNC path for the output .parquet file. Linx expressions are supported.

Data

The data source providing the rows to write. Connect any Linx list of objects. Each item in the source becomes one row in the Parquet file.

The output schema is inferred from the first row at runtime. Each property of the row object maps to a column in the Parquet file.

Compression codec

The compression algorithm applied to each row group written to the file.

Value
Description

Snappy

Fast compression with a moderate ratio. Default.

Gzip

Slower compression with a better ratio. Use when minimising file size is a priority.

Brotli

High compression ratio with higher CPU cost.

Lz4

Very fast compression with a lower ratio. Use when write speed is critical.

Zstd

Good compression ratio with fast decompression. Suitable for archival use.

None

No compression. Use for maximum read speed or when the data is already compressed.

Exist option

Behaviour when the output file already exists at the path specified.

Value
Behaviour

OverwriteFile

Deletes the existing file and writes a new one.

IncrementFileName

Appends _1, _2, etc. to the file name until a non-existing name is found (for example, report_1.parquet).

ThrowException

Raises an exception if the file already exists.

Row group size

The number of rows buffered in memory before each Parquet row group is flushed to disk. Default: 5000. Minimum: 1.

Lower values reduce memory usage but weaken compression. Higher values improve compression but increase peak memory during writing. The default works for most workloads.


Schema Inference

The output schema is inferred from the first row at runtime. Each property becomes a column, with the .NET type mapped to the corresponding Parquet type.

Linx Type
Parquet Type

integer

INT32

double

DOUBLE

boolean

BOOLEAN

string

BYTE_ARRAY (UTF8)

decimal

FIXED_LEN_BYTE_ARRAY (DECIMAL)

DateTime

INT64 (TIMESTAMP millis)

byte

BYTE_ARRAY

int?, double?, etc.

Nullable column

If there are no rows, an empty Parquet file is written with the correct schema.


Validation

Condition
Error

File path is empty

"File path cannot be null or empty."

Output directory does not exist

"Output directory does not exist: {dir}."

File exists and Exist option is ThrowException

"File already exists: {path}."

Row group size is less than 1

"RowGroupSize must be at least 1."


How To

Export data from a database to a Parquet file

This example queries a database and writes the results to a Parquet file.

Steps:

  1. Add an ExecuteSQL function (or equivalent database query function) and set Return options to List of rows.

  2. From the Parquet plugin, drag WriteParquetFile onto the design canvas, placed after the query function.

  3. Set File path to the destination .parquet file path.

  4. Set Data to the result list from the database query (for example, ExecuteSQL.Result).

  5. Choose a Compression codec and Exist option as needed.

Last updated