About this blog
I add technical guide and notes of the applications/ framework / code blocks / techniques I encountered in this blog.
Who am I 🤓?
12-year Software Engineer specializing in the Hadoop/Spark ecosystem and high-availability backend architectures. I write performance-critical code in Scala, Python, and Go, emphasizing design patterns and robust data engineering. My work spans the full lifecycle: from developing ETL/streaming pipelines to end-to-end cloud automation with OpenTofu and Terraform.
My Tech Stack ⚡️:
- Programming Languages:
- Scala, Java, Python, Golang, Typescript
- Bigdata Ecosystem Framework:
- Apache Spark (Spark SQL, Spark Core, Spark Streaming), Apache Beam, Apache Hive, Apache Oozie, Apache Sqoop, Apache Airflow, Apache Kafka, Elastic Search, Apache Solr.
- Containerization & Cloud
- Docker, Podman, K8s, terraform/opentofu
- Other Tools and Frameworks:
- Maven, SBT, Gradle, git, gitlab, JIRA, JUnit, ScalaTest, npm, bun, poetry, makefile, justfile, uv.
Some use cases I worked on đź› :
Big Data / Cloud
- Lift and shift of big data applications from on-prem to cloud (GCP, Azure, AWS)
- Legacy data warehouse modernization
- Data migration
- Data Discovery
- Data validation and cleaup
- Streaming data analytics pipeline with Apache Kafka, Apache Spark
- Typical ETL (Extract Transform Load) workflows
Open Source contibutions:
My blogs on other platforms
- Oozie vs Airflow? Apple to Orange comparison or both are the same?
- The Advantages of Apache Spark’s Tungsten Project for Spark SQL
- Java’s Unsafe Package and Its Role in Apache Spark’s Optimized JVM Performance
- Mastering Error Handling in Scala
- Scala Option Some None
- Testing Embedded H2 DB with Scala and Scalatest