ReadParquetFile
ReadParquetFile reads rows from an Apache Parquet file.
The function has two output modes: Row by row streams rows one at a time through a ForEachRow path, and List of rows returns all rows as a typed list.
Before the output properties are available in the Designer, you must load a Schema. Click the ellipses on the Schema property, select a template .parquet file, and the column definitions are populated automatically.
Properties
Section titled “Properties”File path
Section titled “File path”The absolute or UNC path to the .parquet file to read. Linx expressions are supported.
Output type
Section titled “Output type”Controls how rows are returned:
-
Row by row
AForEachRowexecution path is added to the function and runs once per row. Each row exposes one property per column. Use this mode for large files or row-by-row processing. -
List of rows
The function returns a typedListof all rows. Use this mode when you need the full dataset at once.
Row group index
Section titled “Row group index”Specifies which Parquet row group to read. The default value of -1 reads all row groups in sequence.
Set to a non-negative integer to read only that row group (0-based).
If the specified index is out of range, the function raises an error identifying the total number of row groups available in the file.
Schema
Section titled “Schema”Stores the column definitions used to type the function output. Click the ellipses (…) to open the schema loader and select a .parquet template file. The columns are read from the file and stored in this property. The column list cannot be edited directly — to update it, reload using the schema loader.
The template file is only needed at design time and does not need to be present at runtime.
Validation
Section titled “Validation”| Condition | Error |
|---|---|
| File path is empty | "File path cannot be null or empty." |
| File does not exist at runtime | "Parquet file not found: {path}." |
| A specified column is not in the schema | "Column '{name}' not found in Parquet file schema." |
| Row group index is out of range | "Row group index {n} is out of range. File has {count} row group(s)." |
How To
Section titled “How To”Read data from a Parquet file into a database
Section titled “Read data from a Parquet file into a database”This example reads every row from a Parquet file and inserts each row into a database table.
Steps:
- Get a sample
.parquetfile with the same schema as the file you want to read. - From the Parquet plugin, drag ReadParquetFile onto the design canvas.
- Click the ellipses on the Schema property, select the sample file, and wait for the column list to load.
- Set File path to the target
.parquetfile (this can be a static file location, or an expression). - Set Output type to RowByRow.
- Inside the ForEachRow loop, add an ExecuteSQL function and write an
INSERTstatement referencing the row fields (for example,ForEachRow.CustomerId,ForEachRow.Name,ForEachRow.Amount).
Note: For large data volumes, use DBBulkCopy instead of row-by-row inserts with ExecuteSQL.
Read data from a Parquet file into a list for downstream processing
Section titled “Read data from a Parquet file into a list for downstream processing”- From the Parquet plugin, drag ReadParquetFile onto the design canvas.
- Click the ellipses on the Schema property and select a template file to load the column definitions.
- Set File path to the target
.parquetfile. - Set Output type to ListOfRows.
- Use
ReadParquetFile.Resultin downstream functions such as WriteParquetFile, a REST call, or any function that accepts a list. You can use a ForEach to loop through the items in the list.