Location
PolandRate
Years of experience
8+About
I am a passionate data professional with a robust background in ETL processes, data science, big data processing, and cloud computations. Over the past six years, I've honed my skills in Python, SQL, and various big data technologies such as Apache Spark and Hadoop. My experience spans across multiple industries, including banking, medical, e-commerce, and energy sectors. I've successfully designed and implemented data lakes, analytical platforms, and ELT processes using tools like Databricks, Airflow, and various AWS and Azure services. Additionally, I am proficient in creating and managing CI/CD pipelines and have a keen interest in optimizing processing costs and reducing downtimes. My passion for continuous learning is reflected in my deepening knowledge of big data technologies and my enthusiasm for football, the used-cars market, and Polish cuisine. In my recent role as a Contractor and Lead Data Engineer, I advised on data lake maintenance and expansion, developed processes and architectures for data lakes, and reduced AWS EMR processing costs by 25%. My responsibilities included integrating frameworks, troubleshooting, and creating scalable solutions in cloud environments. Prior to this, I served as a Big Data Developer at Lingaro, where I led projects, developed custom Apache Spark listeners, and built data processing engines. My tenure at PwC Advisory saw me orchestrating data workflows, supervising tasks, and optimizing storage and processing solutions. With a strong educational background in Big Data, Econometrics, and Mathematics from the Warsaw School of Economics and the University of Warsaw, I am well-equipped to tackle complex data challenges and contribute effectively to any data-driven organization.Tech Stack
Big Data, Apache Spark, AWS, Azure, CI/CD, Databricks, Hadoop, Python, SQLExperience
- Developed and maintained data lakes and analytical platforms using Databricks on AWS and Azure, ensuring scalability, data security, and automation of infrastructure as code (IaC).
- Reduced production AWS EMR processing costs by 25% and decreased downtime by 37% through effective optimization techniques, resource management, and configuration adjustments.
- Designed and implemented efficient ETL/ELT processes using Apache Spark, Airflow, and Databricks, tailored to various industry requirements including banking, medical, and e-commerce sectors.
- Utilized AWS and Azure services (S3, IAM, Lambda, EC2, RDS, DynamoDB, Kinesis, Glue, ADLS, EventHubs) to build robust cloud-based data solutions and frameworks.
- Led project teams, distributed tasks, reviewed pull requests, and supervised the implementation of big data solutions, ensuring adherence to project timelines and quality standards.
- Developed and maintained continuous integration and continuous deployment (CI/CD) pipelines for schema migrations, workflows, and cluster pools using tools like Git, Jenkins, Azure Repos, and Azure Pipelines.
- Developed integration frameworks for FHIR format data and Azure Databricks, troubleshooting and optimizing Delta Live Tables jobs to ensure seamless data processing and integration.
Employment history
Advised on Data Lake maintenance and expansion (banking sector, EU, as Lead Data Engineer):
• Apache Spark / Airflow / AWS development (process / code / architecture) for Data Lake.
• Built analytical platform upon Databricks on AWS (resolving matters like scalability/data security in the cloud/ IaC automatization)
• Reduced prod AWS EMR processing costs by 25% and decreased downtime by 37%.
Built a data processing framework for FHIR format compliant data (medical sector, US).
• Developed FHIR format – Azure – Databricks integration framework (also automated cucumber / pytest-bdd test framework)
• Troubleshot Delta Live Tables jobs
Implemented a PoC for Azure Databricks-based Data Lake (e-commerce, PL).
• Designed ELT processes (pyspark, Databricks Workflows).
• Created CICD processes for schema migrations, workflows, cluster pools, etc. Designed Apache Airflow architecture for an MFT business case (energy sector, PL).
Developed custom Apache Spark listeners (FMCG)
• Led project.
• Gathered logs produced by Spark jobs on Databricks.
• Visualized and pointed out weak spots, cost generators, and suboptimal queries.
Master Data Engineering (FMCG)
• Migrated SAP-based ETL to Microsoft Azure.
• Built from scratch data processing engine (Databricks + Airflow + ADLS + Docker).
• Built REST APIs connecting the engine’s components.
Big Data Engineering (Financial Services)
• Developed a solution responsible for orchestrating workflows from data vendors (public and private sources, both structured and unstructured) to a machine learning engine.
• Reviewed pull requests, distributed tasks to subordinates, and supervised them.
• Planned and executed data migration from HDFS to Azure Blob Storage.
• Optimized Apache Spark jobs and HDFS storage.
Created store chain expansion model (Retail):
• Designed and implemented a machine learning workflow responsible for the prediction of store income based on geographical and internal data.
Cloudera Hadoop cluster administration:
• Configured nodes / roles, installed / updated software.
• Performance monitoring, and troubleshooting.
• Prepared and maintained a working environment for Data Scientists (JupyterHub, Cloudera Data Science Workbench, mlflow, RStudio Server, etc.)
• Completed Cloudera Administrator Training for Apache Hadoop
• Developed KPIs tracking Shiny application
• Developed a process responsible for handling loans assignment to external debt collectors.
• Refactored an LGD calculation model from Excel based to a standalone Shiny dashboard.
• Assisted in the development and maintenance of various financial and operational reports, ensuring accurate data collection and presentation for internal review and decision-making processes.
• Conducted data analysis to support various departments, identifying trends and insights to aid in strategic planning and operational improvements.
• Provided administrative support to the team, including scheduling meetings, preparing documentation, and assisting with project management tasks to ensure smooth and efficient operations.
Education history
We've delighted 83 clients with our IT recruitment and software development services.
Read about a few of them below...