Spark SQL Test Doubles
spark-tests.sql module.
Define test classes for Spark SQL:
Queries on data frames are run regularly;
Modification statements are just logged and does not modify data.
- spark_tests.sql.FAKE_DF_WRITER = <spark_tests.sql.FakeDFWriter object>
FakeDFWriter singleton instance.
- class spark_tests.sql.FakeDFWriter
Stubs DataFrameWriter.
Logs the write operation instead of actually writing the data.
Singleton assumes that for each test case there is only one writing.
- path
case writing to a file, the file path
- name
case writing to a table, the table name
- save_format
the format used to save
- source
the source FakeDataFrame
- save_mode
specifies the behavior of the save operation: “error”, “errorifexists”, “append”, “overwrite”, “ignore”
- partition_by
names of partitioning columns
- save_options
all other of partitioning columns
- is_saved
flag of saving execution
- clear()
Clear self to default values.
self.save_format = “parquet”
self.save_mode = “errorifexists”
- format(format: str) FakeDFWriter
Logs format.
- mode(mode: str) FakeDFWriter
Logs mode.
- option(key: str, value: str) FakeDFWriter
Logs a configuration option.
- options(**options: str) FakeDFWriter
Logs configuration options.
- partitionBy(*cols: str) FakeDFWriter
Logs partition columns.
- save(path: Optional[str] = None, format: Optional[str] = None, mode: Optional[str] = None, partitionBy: Optional[List[str]] = None, **options: str) None
Logs current DataFrame rows that would be written to a file.
- saveAsTable(name: str, format: Optional[str] = None, mode: Optional[str] = None, partitionBy: Optional[Union[str, List[str]]] = None, **options: str) None
Logs current DataFrame rows that would be written to a table.
- class spark_tests.sql.FakeDataFrame(real: DataFrame)
DataFrame proxy.
- real
real DataFrame
- alias_name
DataFrame alias name
- property write: FakeDFWriter
Returns FAKE_DF_WRITER.
Set self as the source DataFrame.
- class spark_tests.sql.FakeGroupedData(real: GroupedData)
GroupedData proxy.
- real
real GroupedData.
- agg(*exprs) FakeDataFrame
Compute aggregates.
Delegate to self.real
Return result as a FakeDataFrame
- pivot(pivot_col: str, values: Optional[List[str]] = None) GroupedData
Pivots a column of the current DataFrame.
Delegates to self.real
Return result as a FakeGroupedData
- sum(*cols: str) FakeDataFrame
Compute the sum for each numeric columns for each group.
Delegates to self.real
Return result as FakeDataFrame
- class spark_tests.sql.FakeSparkSession(real: SparkSession)
SparkSession proxy.
Queries on data frames are run regularly;
Modification statements are just logged and does not modify data.
- real
real SparkSession.
- sql_queries
List of modification statements sent.
- clear() None
Clear sql_queries list and FAKE_DF_WRITER
- createDataFrame(data, schema=None) FakeDataFrame
Creates a FakeDataFrame.
Delegates creation to self.real
Returns created DataFrame as a FakeDataFrame
- property sparkContext: SparkContext
Returns SparkContext.
Delegates to self.real.
- sql(sql_statement: str) FakeDataFrame
Logs a sql_statement.
Just appends sql_statement into self.sql_queries with no change to data.
Returns: empty FakeDataFrame.
- Parameters
sql_statement –
- table(table_name: str) FakeDataFrame
Returns specified table as FakeDataFrame.
Delegates to self.real. Result is returned as a FakeDataFrame. This behavior may be changed by subclasses.