Job Description
				  Role: Data Tester  
  Location: Louisville, KY (Remote) 
  Type: Contract  
  Job Summary:  
  -  We are seeking an experienced Data Tester with strong expertise in Databricks, PySpark, and Big Data ecosystems. The ideal candidate will have a solid background in testing data pipelines, ETL workflows, and analytical data models, ensuring data integrity, accuracy, and performance across large-scale distributed systems.  
-  This role requires hands-on experience with Databricks, Spark-based data processing, and strong SQL validation skills, along with familiarity in data lake / Delta Lake testing, automation, and cloud environments (AWS, Azure, or GCP).  
 Key Responsibilities:  
  -  Validate end-to-end data pipelines developed in Databricks and PySpark, including data ingestion, transformation, and loading processes.  
-  Develop and execute test plans, test cases, and automated scripts for validating ETL jobs and data quality across multiple stages.  
-  Conduct data validation, reconciliation, and regression testing using SQL, Python, and PySpark DataFrame APIs.  
-  Verify data transformations, aggregations, and schema consistency across raw, curated, and presentation layers.  
-  Test Delta Lake tables for schema evolution, partitioning, versioning, and performance.  
-  Collaborate with data engineers, analysts, and DevOps teams to ensure high-quality data delivery across the environment.  
-  Analyze Databricks job logs, Spark execution plans, and cluster metrics to identify and troubleshoot issues.  
-  Automate repetitive test scenarios and validations using Python / PySpark frameworks.  
-  Participate in Agile/Scrum ceremonies, contributing to sprint planning, estimations, and defect triage.  
-  Maintain clear documentation for test scenarios, execution reports, and data lineage verification.  
 Required Qualifications:  
  - 8+ years of overall experience in data testing / QA within large-scale enterprise data environments.  
- 5+ years of experience in testing ETL / Big Data pipelines, validating data transformations, and ensuring data integrity.  
- 4+ years of hands-on experience with Databricks, including notebook execution, job scheduling, and workspace management.  
- 4+ years of experience in PySpark (DataFrame APIs, UDFs, transformations, joins, and data validation logic).  
- 5+ years of strong proficiency in SQL (joins, aggregations, window functions, and analytical queries) for validating complex datasets.  
- 3+ years of experience with Delta Lake or data lake testing (schema evolution, ACID transactions, time travel, partition validation).  
- 3+ years of experience in Python scripting for automation and data validation tasks.  
- 3+ years of experience with cloud-based data platforms (Azure Data Lake, AWS S3, or GCP BigQuery).  
- 2+ years of experience in test automation for data pipelines using tools like pytest, PySpark test frameworks, or custom Python utilities.  
- 4+ years of Strong understanding of data warehousing concepts, data modeling (Star/Snowflake), and data quality frameworks.  
- 4+ years of experience with Agile / SAFe methodologies, including story-based QA and sprint deliverables.  
- 6+ years of experience in analytical and debugging skills for identifying data mismatches, performance issues, and pipeline failures.  
 Preferred Qualifications:  
  -  Experience with CI/CD for Databricks or data testing (GitHub Actions, Jenkins, Azure DevOps).  
-  Exposure to BI validation (Power BI, Tableau, Looker) for verifying downstream reports.  
-  Knowledge of REST APIs for metadata validation or system integration testing.  
-  Familiarity with big data tools like Hive, Spark SQL, Snowflake, and Airflow.  
-  Cloud certifications (e.g., Microsoft Azure Data Engineer Associate or AWS Big Data Specialty) are a plus.  
Job Tags
				 Contract work, Remote work,