About this blog

I add technical guide and notes of the applications/ framework / code blocks / techniques I encountered in this blog.

Who am I 🤓?

12-year Software Engineer specializing in the Hadoop/Spark ecosystem and high-availability backend architectures. I write performance-critical code in Scala, Python, and Go, emphasizing design patterns and robust data engineering. My work spans the full lifecycle: from developing ETL/streaming pipelines to end-to-end cloud automation with OpenTofu and Terraform.

My Tech Stack ⚡️:

Programming Languages:
- Scala, Java, Python, Golang, Typescript
Bigdata Ecosystem Framework:
- Apache Spark (Spark SQL, Spark Core, Spark Streaming), Apache Beam, Apache Hive, Apache Oozie, Apache Sqoop, Apache Airflow, Apache Kafka, Elastic Search, Apache Solr.
Containerization & Cloud
- Docker, Podman, K8s, terraform/opentofu
Other Tools and Frameworks:
- Maven, SBT, Gradle, git, gitlab, JIRA, JUnit, ScalaTest, npm, bun, poetry, makefile, justfile, uv.

Some use cases I worked on 🛠:

Big Data / Cloud

Lift and shift of big data applications from on-prem to cloud (GCP, Azure, AWS)
Legacy data warehouse modernization
Data migration
Data Discovery
Data validation and cleaup
Streaming data analytics pipeline with Apache Kafka, Apache Spark
Typical ETL (Extract Transform Load) workflows

Open Source contibutions:

Terraform AWS Provider