WriteParquetFile
WriteParquetFile writes data rows to an Apache Parquet file.
The function accepts any Linx data source (a database result set or any list) and writes a Parquet file. The output schema is inferred from the data at runtime. No schema file is required.
Properties
Section titled “Properties”File path
Section titled “File path”The absolute or UNC path for the output .parquet file. Linx expressions are supported.
The data source providing the rows to write. Connect any Linx list of objects. Each item in the source becomes one row in the Parquet file.
The output schema is inferred from the first row at runtime. Each property of the row object maps to a column in the Parquet file.
Compression codec
Section titled “Compression codec”The compression algorithm applied to each row group written to the file.
| Value | Description |
|---|---|
| Snappy | Fast compression with a moderate ratio. Default. |
| Gzip | Slower compression with a better ratio. Use when minimising file size is a priority. |
| Brotli | High compression ratio with higher CPU cost. |
| Lz4 | Very fast compression with a lower ratio. Use when write speed is critical. |
| Zstd | Good compression ratio with fast decompression. Suitable for archival use. |
| None | No compression. Use for maximum read speed or when the data is already compressed. |
Exist option
Section titled “Exist option”Behaviour when the output file already exists at the path specified.
| Value | Behaviour |
|---|---|
| OverwriteFile | Deletes the existing file and writes a new one. |
| IncrementFileName | Appends _1, _2, etc. to the file name until a non-existing name is found (for example, report_1.parquet). |
| ThrowException | Raises an exception if the file already exists. |
Row group size
Section titled “Row group size”The number of rows buffered in memory before each Parquet row group is flushed to disk. Default: 5000. Minimum: 1.
Lower values reduce memory usage but weaken compression. Higher values improve compression but increase peak memory during writing. The default works for most workloads.
Schema Inference
Section titled “Schema Inference”The output schema is inferred from the first row at runtime. Each property becomes a column, with the .NET type mapped to the corresponding Parquet type.
| Linx Type | Parquet Type |
|---|---|
integer | INT32 |
double | DOUBLE |
boolean | BOOLEAN |
string | BYTE_ARRAY (UTF8) |
decimal | FIXED_LEN_BYTE_ARRAY (DECIMAL) |
DateTime | INT64 (TIMESTAMP millis) |
byte | BYTE_ARRAY |
int?, double?, etc. | Nullable column |
If there are no rows, an empty Parquet file is written with the correct schema.
Validation
Section titled “Validation”| Condition | Error |
|---|---|
| File path is empty | "File path cannot be null or empty." |
| Output directory does not exist | "Output directory does not exist: {dir}." |
| File exists and Exist option is ThrowException | "File already exists: {path}." |
| Row group size is less than 1 | "RowGroupSize must be at least 1." |
How To
Section titled “How To”Export data from a database to a Parquet file
Section titled “Export data from a database to a Parquet file”This example queries a database and writes the results to a Parquet file.
Steps:
- Add an ExecuteSQL function (or equivalent database query function) and set Return options to List of rows.
- From the Parquet plugin, drag WriteParquetFile onto the design canvas, placed after the query function.
- Set File path to the destination
.parquetfile path. - Set Data to the result list from the database query (for example,
ExecuteSQL.Result). - Choose a Compression codec and Exist option as needed.