The Mission
As a Data Lake Engineer at ThreatMark, your primary mission will be to develop and maintain the Data Lake environment with a goal to enable easier and quicker execution of data analysis and machine learning tasks. You will collaborate closely with our data analysts, data scientists, and engineering teams to ensure our data is precise, accessible, and meaningful, ultimately enhancing the quality of our products. Your work will enable ThreatMark to achieve its business objectives with confidence, backed by reliable and insightful data analysis.
General
Seniority: Medior (3+ years of experience)
Hire: Employee or Contractor
Employment Type: Full-time, Employee or Contractor
Place of work: Offices in Brno, Bratislava or Prague; Full Remote Possible
Responsibilities
In this role, you will:
Data Lake Development:
Build and maintain infrastructure for storage of structured and semi-structured multitenant data in Data Lake.
Maintain and develop configuration of AWS infrastructure and IAM policies.
Develop, automate and orchestrate data ingestion, ETL processes and maintenance jobs.
Create a layer of consolidated data to be used for data analysis and reporting.
Enable and standardize usage of AWS services for publishing reports and interactive dashboards.
Data Quality and Integrity:
Ensure high levels of data quality and integrity across all data sources and pipelines.
Implement monitoring and alerting mechanisms to detect and address data issues promptly.
Performance Optimization:
Set up data storage policies for lower storage costs.
Optimize data processing workflows for performance and efficiency.
Address bottlenecks and ensure data pipelines can scale with increasing data volumes.
Data Security and Compliance:
Ensure compliance with relevant data protection regulations and standards.
Implement data security measures to safeguard sensitive information.
Collaboration and Support:
Work closely with data analysts and data scientists to understand their data needs and ensure data availability.
Provide support and guidance on best practices for writing ETL and data engineering tasks within the AWS environment and used technologies.
Aid in MLOps processes.
Automate data reports.
Participation in data analysis and research welcomed.
Technological environment setup:
Qualifications
Proven experience with development of data storage and manipulation solutions (3+ years).
Strong proficiency in building and maintaining ETL and data pipelines.
Ability to communicate effectively in English.
Absolute Must-haves:
Need to know, have experience with or ability to quickly adopt as you will be working with:
PySpark, Terraform, Airflow
AWS Services: S3, IAM, EC/EMR, Lambda, Glue, QuickSight (experience with their equivalents in Azure or GCP is relevant)
Databrics
Additional points for:
What We Value
Ownership: A strong ability to take ownership and move towards shared goals without supervision.
Collaboration: A positive, can-do attitude with no-excuse startup mindset, clear, honest and timely communication.
Innovation: A fervent passion to learn new skills and technologies, seeking improvement, being open to new ideas, and making data-driven decisions.
Adaptability: Thriving in a fast-paced and evolving environment, being flexible and ready to take on new challenges.
Practicality and efficiency: Employing “80/20 rule” (Pareto principle) in solving tasks.
At ThreatMark, we value diversity and are committed to creating an inclusive environment for all employees. If you are passionate about data analysis and eager to contribute to a team that is making a significant impact in the cybersecurity landscape, we encourage you to apply. Please submit your resume and a brief cover letter explaining your interest in the role and how your skills align with our mission.