Location
PolandRate
Years of experience
10About
I am an accomplished Big Data Developer with a robust background in creating and managing data solutions using Java, Python, and Scala. I have deep expertise in tools such as Apache Spark, Kafka, and AWS, and am skilled in both ETL and real-time data pipeline development. My career is marked by a commitment to merging technical prowess with business objectives to deliver strategic data architectures. Currently, I lead a team of data engineers at The Stepstone Group Polska, where I focus on fostering a collaborative environment, coaching, and mentoring team members, and staying abreast of the latest trends in big data and cloud technologies. In my role as Chapter Leader for Data Development, I build communities for data engineers, implement best practices, and conduct technical workshops. I have significant experience in developing ETL/ELT pipelines, designing data marts, and creating PowerBI dashboards. My work includes preparing automatic data quality reports, creating REST services on AWS, and closely collaborating with product owners and data science teams. I have successfully led projects such as developing a data quality rule-based framework and a framework to speed up ETL pipeline creation. My technical stack includes MLOps, Terraform, Airflow, and various programming languages, demonstrating my versatility and depth of knowledge in the field.Tech Stack
Big Data, Airflow, AWS and Cloud, Data modelling, ETL, GraphQL, Java, Kafka, MLOps, Python, SAS programming, Scala, Software design, Spark, SQL, TerraformExperience
- Developing ETL/ELT Pipelines: Creating and managing ETL/ELT pipelines in Apache Spark, ensuring efficient data processing and transformation.
- Real-Time Pipeline Development: Developing real-time data pipelines using Kafka, allowing for timely data streaming and processing.
- Process Scheduling in Airflow: Managing and scheduling processes in Apache Airflow, including the development of custom operators and sensors.
- Data Modelling and Analysis: Designing data marts and global data domains for data platforms using Domain-Driven Design (DDD) principles.
- Creating REST Services on AWS: Developing RESTful services on AWS, facilitating seamless data integration and access.
- Technical Leadership and Mentorship: Coaching and mentoring team members, building a community for data engineers, and leading tech workshops to promote best practices.
- Collaborating with Data Science Teams: Working closely with data science teams and product owners to develop new features and ensure alignment with business goals.
Employment history
– Building a community for data engineers
– Working on best practices, development and training plans
– Heavy weight on coaching and mentoring
– Conducting tech workshops
– Technology Stack: MLOps, Terraform, Airflow, Scala, Java, Python, Graph databases, SQL, ETL, AWS, Spark, Kafka, Data modelling
– Data pipelines
– Developing ETL/ELT pipelines in Spark
– Developing real time pipelines in Kafka
– Process scheduling in Airflow
– Developing custom operators/sensors in Airflow
Data modelling & analysis
– Designing data marts
– Designing global domains for data platform with DDD
– Preparing PowerBI dashboards
– Developing automatic data quality reports
– Creating REST services on AWS
– Creating POCs for emerging tech – e.g. Neo4J, GraphQL, MLFlow
– Close cooperation with product owner for new feature
– Close cooperation with data science teams
– Talent recruitment and team members coaching
– Implementing cross team best practices, tools & frameworks
Example projects :
– Customer service reporting data mart
– Graph based company normalization pipeline
– GUI for ML model deployment
– Data quality rule based framework
– Framework for speedup of ETL pipeline creation (AWS Glue/Spark)
– Data mart & reporting for Redshift usage
– Creating and managing efficient ETL/ELT pipelines to ensure seamless data processing and transformation.
– Building and maintaining real-time data pipelines using Kafka to support timely data streaming and processing.
– Managing and scheduling processes in Apache Airflow, including the development of custom operators and sensors.
– Designing data marts and global data domains for data platforms using Domain-Driven Design (DDD) principles.
– Developing RESTful services on AWS to facilitate seamless data integration and access.
– Creating and managing efficient ETL processes using SAS to ensure seamless data extraction, transformation, and loading.
– Designing and implementing end-to-end and integration tests to ensure the quality and reliability of data processes.
– Developing tools to generate synthetic data for testing and development purposes.
– Leading and contributing to the data integration efforts for a bank’s enterprise fraud management system.
– Working closely with data science teams to support new feature development and ensure data quality and integrity.
– Supporting the creation and maintenance of efficient ETL/ELT pipelines using Apache Spark.
– Assisting in the development and management of real-time data pipelines using Apache Kafka.
– Participating in designing data models and conducting data analysis to support business needs.
– Assisting with scheduling data processing tasks using Apache Airflow.
– Working closely with senior developers to learn best practices and gain hands-on experience in big data technologies.
– Technology Stack: Data engineering, SAS programming
– Creating ETL processes in SAS
– Preparing automated e2e, integration tests
– Creating data generation tools
– Main project – Data integration for bank enterprise fraud management system