970 Cloud Data Engineer jobs in Singapore
Senior Cloud Data Engineer
Posted today
Job Viewed
Job Description
We envision a data-driven healthcare system that empowers health, prevents disease and provides excellent value-based care.
To realize this vision, we design and implement innovative solutions essential for the desired health system transformation.
Our approach is agile using rapid and continuous build-measure-learn cycles that identify, develop, deliver and adapt technology to improve health. We work closely with implementation partners allowing our teams to be responsive to real needs, as demonstrated by actual data, and nimble in evolving prototypes and scalable solutions.
The main tools being used are:
- Agile development on cloud based environment
- Digital solution architecture design, development and operation
- Data analytics and modelling to understand healthcare flows and resource usage
- Artificial Intelligence
Current Initiatives:
- Digital application for mental wellness
- Passive and active sensing tools for patient lifestyle understanding and improvement
- Tele-health monitoring solutions for connected care
- Mobile patient and clinical applications
- Digital phenotyping
- Data analytics, including for patient flow analysis
We seek a skilled Data Engineer who will lead the design, development, and maintenance of scalable and secure data pipelines and architectures to support healthcare analytics and digital health solutions.
Key Responsibilities Include:
- Lead the design, development, and maintenance of scalable and secure data pipelines and architectures to support healthcare analytics and digital health solutions.
- Collaborate with cross-functional teams to understand data needs and translate them into robust data engineering solutions.
- Manage and optimize data ingestion, transformation, and storage processes across structured and unstructured data sources.
- Ensure data quality, integrity, and governance across all data platforms.
- Drive the adoption of best practices in data engineering, including CI/CD, testing, and monitoring.
- Mentor and guide junior data engineers and contribute to team capability building.
- Work closely with data scientists, analysts, and application developers to enable data-driven decision-making.
- Document data workflows, architecture, and operational procedures for knowledge sharing and continuity.
To succeed in this role, you must have:
- Bachelor's degree (or above) in Computer Science, Data Engineering, Information Systems, or a related field.
- At least 8 years of experience in data engineering, with a strong background in building and managing data pipelines and architectures.
You should have:
- Hands-on experience with cloud-based data platforms.
- Proficiency in data pipeline tools (e.g. Apache Airflow, Kafka, Spark).
- Strong SQL skills and experience with relational and NoSQL databases.
- Strong Python programming skills
- Experience with data warehousing solutions (e.g., Snowflake, Redshift, BigQuery).
- Familiarity with data governance, security, and compliance in healthcare or regulated environments.
- Plus: Experience with healthcare data standards (e.g., FHIR, HL7) and DevOps practices.
- Leadership: Ability to lead data engineering initiatives, coordinate with stakeholders, and manage project timelines and deliverables.
- Good communication skills, both written and verbal.
Your tasks will include:
- Designing and implementing data pipelines and architectures
- Leading cross-functional teams
- Maintaining data quality and integrity
- Implementing data governance and security measures
AWS Cloud Data Engineer
Posted today
Job Viewed
Job Description
Job Title:
AWS ETL Cloud Data Engineer
Job Overview:
The AWS Cloud Data Engineer will be responsible for designing, building, and maintaining scalable data pipelines and data infrastructure in the AWS cloud environment. This role requires expertise in AWS services, data modeling, ETL processes, and a keen understanding of best practices for data management and governance.
Key Responsibilities:
Design, build, and operationalize large-scale enterprise data solutions and applications using AWS data and analytics services in combination with third-party tools – including Spark/Python on Glue, Redshift, S3, Athena, RDS-PostgreSQL, Airflow, Lambda, DMS, Code Commit, Code Pipeline, Code Build, etc.
Design and build production ETL data pipelines from ingestion to consumption within a big data architecture, using DMS, DataSync, and Glue.
Understand existing applications (including on-premise Cloudera Data Lake) and infrastructure architecture.
Analyze, re-architect, and re-platform on-premise data warehouses to data platforms on AWS cloud using AWS or third-party services.
Design and implement data engineering, ingestion, and curation functions on AWS cloud using native AWS services or custom programming.
Perform detailed assessments of current data platforms and create transition plans to AWS cloud.
Collaborate with development, infrastructure, and data center teams to define Continuous Integration and Continuous Delivery processes following industry standards.
Work on hybrid Data Lake environments.
Coordinate with multiple stakeholders to ensure high standards are maintained.
Mandatory Skill-set:
Bachelor's Degree in Computer Science, Information Technology, or related fields.
5+ years of experience with ETL, Data Modeling, Data Architecture to build Data Lakes. Proficient in ETL optimization, designing, coding, and tuning big data processes using PySpark.
3+ years of extensive experience working on AWS platform using core services like AWS Athena, Glue PySpark, Redshift, RDS-PostgreSQL, S3, and Airflow for orchestration.
Good to Have Skills:
Fundamentals of the Insurance domain.
Functional knowledge of IFRS17.
Understanding of 14 days AL (Accumulated Leave).
Knowledge of company insurance benefits.
#J-18808-Ljbffr
Data Engineer – Cloud & Big Data Platforms
Posted 11 days ago
Job Viewed
Job Description
- Architect and implement scalable ETL/ELT data pipelines across cloud and hybrid environments using Azure Data Factory, Databricks, PySpark, and Kafka
- Build and maintain data lakes and lakehouse architectures aligned with enterprise security and governance requirements
- Ensure secure data processing using tools like Apache Ranger, Protegrity, and Collibra
- Support federated querying and metadata management through Trino, Starburst, and EDC tools
- Lead CI/CD automation for data workflows using Azure DevOps, Git, and related DevOps practices
- Act as SME for L2/L3 support, performance tuning, and audit readiness in production environments
- Collaborate with analytics, ML, and compliance teams to deliver ML-ready, governed datasets
- Minimum 8 years of relevant experience in data engineering
- At least 5 years working with data platforms in banking/regulated environments
- Deep technical expertise with:
Azure Data Factory, Databricks, Apache Spark (Batch & Stream)
Kafka, PySpark, AWS S3, Cloudera, and Azure Synapse
Data security/governance tools: Apache Ranger, Collibra, Protegrity
DevOps and orchestration: Azure DevOps, Airflow, Control-M
Programming & querying: Python, SQL, Trino/Presto
Data Cloud Engineer #IJf
Posted today
Job Viewed
Job Description
Roles and Responsibilities:
ETL & Data Engineering:
- Build and maintain robust ETL pipelines to ingest, transform, and load high-volume, high-velocity data from IT infrastructure, monitoring systems, and cloud environments.
- Develop batch and real-time data flows using frameworks.
- Optimize ETL jobs for scalability, fault tolerance, and low latency.
- Implement data validation, cleansing, and normalization processes for consistent AI model input.
- Integrate with AIOps platforms and ML pipelines using REST APIs or event-driven architectures.
- Develop and maintain robust data pipelines for ingesting, filter, transforming, and loading data from various sources (e.g., network devices, appliances, databases, APIs, cloud storage)
Application & Backend Development:
- Design and build backend services (microservices, APIs) in Python / Go/Java / Ruby to support data ingestion, metadata services, and configuration management.
DevOps & Orchestration:
- Use tools like Elastic Stack, Apache Airflow, Prefect, or Dagster to schedule and monitor ETL jobs.
- Implement CI/CD pipelines for deploying ETL services and full-stack apps.
Required Skills & Experience:
- 4+ years of experience in data engineering with a proven track record of delivering large-scale data solutions.
- Strong expertise in Snowflake, Advanced SQL, GBI ETL framework, and Python.
- Hands-on experience in designing and implementing data pipelines, data integration, and data warehousing solutions.
- ETL Frameworks: Apache NiFi, Spark, Airflow, Flink, Kafka, Talend
- Orchestration: Airflow, Prefect, Dagster
- Data Storage: PostgreSQL, MongoDB, Elasticsearch, Snowflake, BigQuery
- Streaming Platforms: Kafka, Kinesis, Pub/Sub
- Prior experience in stakeholder management, including requirements gathering, design playback, and presentations.
- Familiarity with cloud platforms such as AWS, Azure, or Google Cloud.
Interested applicants, please Email , and look for
Jensen Fang Lifa
Recruit Express Pte Ltd
EA License No. 99C4599
EA Personnel Registration Number: R2197080
We regret that only shortlisted candidates will be contacted.
Tell employers what skills you haveTableau
Snowflake Cloud Data Warehouse
Data Analysis
Azure
Advanced SQL
Hadoop
SQL Tuning
Data Management
Google Cloud Platform
ETL
Data Integration
Data Quality
Data Engineering
Project Management
Python
Data Analytics
Data Warehousing
Data Cloud Engineer #IJf
Posted 6 days ago
Job Viewed
Job Description
Roles and Responsibilities:
ETL & Data Engineering:
- Build and maintain robust ETL pipelines to ingest, transform, and load high-volume, high-velocity data from IT infrastructure, monitoring systems, and cloud environments.
- Develop batch and real-time data flows using frameworks.
- Optimize ETL jobs for scalability, fault tolerance, and low latency.
- Implement data validation, cleansing, and normalization processes for consistent AI model input.
- Integrate with AIOps platforms and ML pipelines using REST APIs or event-driven architectures.
- Develop and maintain robust data pipelines for ingesting, filter, transforming, and loading data from various sources (e.g., network devices, appliances, databases, APIs, cloud storage)
Application & Backend Development:
- Design and build backend services (microservices, APIs) in Python / Go/Java / Ruby to support data ingestion, metadata services, and configuration management.
DevOps & Orchestration:
- Use tools like Elastic Stack, Apache Airflow, Prefect, or Dagster to schedule and monitor ETL jobs.
- Implement CI/CD pipelines for deploying ETL services and full-stack apps.
Required Skills & Experience:
- 4+ years of experience in data engineering with a proven track record of delivering large-scale data solutions.
- Strong expertise in Snowflake, Advanced SQL, GBI ETL framework, and Python.
- Hands-on experience in designing and implementing data pipelines, data integration, and data warehousing solutions.
- ETL Frameworks: Apache NiFi, Spark, Airflow, Flink, Kafka, Talend
- Orchestration: Airflow, Prefect, Dagster
- Data Storage: PostgreSQL, MongoDB, Elasticsearch, Snowflake, BigQuery
- Streaming Platforms: Kafka, Kinesis, Pub/Sub
- Prior experience in stakeholder management, including requirements gathering, design playback, and presentations.
- Familiarity with cloud platforms such as AWS, Azure, or Google Cloud.
Interested applicants, please Email , and look for
Jensen Fang Lifa
Recruit Express Pte Ltd
EA License No. 99C4599
EA Personnel Registration Number: R2197080
We regret that only shortlisted candidates will be contacted.
Chief Data Warehousing Specialist
Posted today
Job Viewed
Job Description
As a seasoned professional in the field of data warehousing, your primary duties will include designing and implementing end-to-end software solutions for large-scale projects and change requests. You will be responsible for supporting users by addressing performance issues or providing query responses.
- Develop frameworks to build data pipelines using Pyspark, Scala, and Java.
- Implement Hadoop-based Data marts using Spark-based frameworks.
Key Requirements
- Have good working experience in core technical areas using Python, Java, PySpark, and Scala.
- Be proficient in Cloudera CDH/CDP components.
- Develop Spark-based ingestion frameworks and build feature pipelines for AI/ML model execution and large-scale data warehouse/data mart support.
Added Advantage
Scripting knowledge in Bash, Python, and Perl can be beneficial for automating tasks.
Good technical knowledge of RHEL/Linux and Unix hardware, operating system, and system services is a plus.
Principal Cloud Engineer - Data & AI
Posted today
Job Viewed
Job Description
Join to apply for the
Principal Cloud Engineer - Data & AI
role at
Equinix
1 week ago Be among the first 25 applicants
Join to apply for the
Principal Cloud Engineer - Data & AI
role at
Equinix
Get AI-powered advice on this job and more exclusive features.
Who are we?
Equinix is the world’s digital infrastructure company, operating over 260 data centersacross the globe. Digital leaders harness Equinix's trusted platform to bring together and interconnect foundational infrastructure at software speed. Equinix enables organizations to access all the right places, partners and possibilities to scale with agility, speed the launch of digital services, deliver world-class experiences and multiply their value, while supporting their sustainability goals.
Who are we?
Equinix is the world’s digital infrastructure company, operating over 260 data centersacross the globe. Digital leaders harness Equinix's trusted platform to bring together and interconnect foundational infrastructure at software speed. Equinix enables organizations to access all the right places, partners and possibilities to scale with agility, speed the launch of digital services, deliver world-class experiences and multiply their value, while supporting their sustainability goals.
Our culture is based on collaboration and the growth and development of our teams. We hire hardworking people who thrive on solvingchallengingproblems and give them opportunities to hone new skills and try new approaches, as we grow our product portfolio with new software and network architecture solutions. We embrace diversity in thought and contribution and are committed to providingan equitable work environment that is foundational to our core values as a company and is vital to our success.
Job Summary
We’re looking for a Principal Cloud Engineer with a strong foundation in Multi-Cloud & multi region deployment, data architecture, distributed systems, and modern cloud-native platforms to architect, build, and maintain intelligent infrastructure and systems that power our AI, GenAI and data-intensive workloads.
You’ll work closely with cross-functional teams, including data scientists, ML & software engineers, and product managers & play a key role in designing a highly scalable platform to manage the lifecycle of data pipelines, APIs, real-time streaming, and agentic GenAI workflows, while enabling federated data architectures. The ideal candidate will have a strong background in building and maintaining scalable AI & Data Platform, optimizing workflows, and ensuring the reliability and performance of Data Platform systems.
Responsibilities
Cloud Architecture & Engineering
Deep expertise in designing, implementing, and managing architectures across multiple cloud platforms (e.g., AWS, Azure, GCP)
Proven experience in architecting hybrid and multi-cloud solutions, including interconnectivity, security, workload placement, and DR strategies
Strong knowledge of cloud-native services (e.g., serverless, containers, managed databases, storage, networking)
Experience with enterprise-grade IAM, security controls, and compliance frameworks across cloud environments
AI & GenAI Platform Integration
Integrate LLM APIs (OpenAI, Gemini, Claude, etc.) into platform workflows for intelligent automation and enhanced user experience
Build and orchestrate multi-agent systems using frameworks like CrewAI, LangGraph, or AutoGen for use cases such as pipeline debugging, code generation, and MLOps
Experience in developing and integrating GenAI applications using MCP and orchestration of LLM-powered workflows (e.g., summarization, document Q&A, chatbot assistants, and intelligent data exploration)
Hands-on expertise building and optimizing vector search and RAG pipelines using tools like Weaviate, Pinecone, or FAISS to support embedding-based retrieval and real-time semantic search across structured and unstructured datasets
Engineering Enablement
Create extensible CLIs, SDKs, and blueprints to simplify onboarding, accelerate development, and standardize best practices
Streamline onboarding, documentation, and platform implementation & support using GenAI and conversational interfaces
Collaborate across teams to enforce cost, reliability, and security standards within platform blueprints.
Work with engineering by introducing platform enhancements, observability, and cost optimization techniques
Foster a culture of ownership, continuous learning, and innovation
Automation, IaC, CI/CD
Mastery of Infrastructure as Code (IaC) tools — especially Terraform, Terragrunt, and CloudFormation / ARM / Deployment Manager
Experience building and managing cloud automation frameworks (e.g., using Python, Go, or Bash for orchestration and tooling)
Hands-on experience with CI/CD pipelines (e.g., GitHub Actions) for cloud resource deployments
Expertise in implementing policy-as-code & Compliance-as-code (e.g., Open Policy Agent, Sentinel)
Security, Governance & Cost
Strong background in implementing cloud security best practices (network segmentation, encryption, secrets management, key management, etc.).
Experience with multi-account / multi-subscription / multi-project governance models, including landing zones, service control policies, and organizational structures
Ability to design for cost optimization, tagging strategies, and usage monitoring across cloud providers
Monitoring & Operations
Familiarity with cloud monitoring, logging, and observability tools (e.g., CloudWatch, Azure Monitor, GCP Operations Suite, Datadog, Prometheus)
Experience with incident management and building self-healing cloud architectures
Platform & Cloud Engineering
Develop and maintain real-time and batch data pipelines using tools like Airflow, dbt, Dataform, and Dataflow/Spark
Design and develop event-driven architectures using Apache Kafka, Google Pub/Sub, or equivalent messaging systems
Build and expose high-performance data APIs and microservices to support downstream applications, ML workflows, and GenAI agents
Architect and manage multi-cloud and hybrid cloud platforms (e.g., GCP, AWS, Azure) optimized for AI, ML, and real-time data processing workloads
Build reusable frameworks and infrastructure-as-code (IaC) using Terraform, Kubernetes, and CI/CD to drive self-service and automation
Ensure platform scalability, resilience, and cost efficiency through modern practices like GitOps, observability, and chaos engineering
Leadership & Collaboration
Experience leading cloud architecture reviews, defining standards, and mentoring engineering teams
Ability to work cross-functionally with security, networking, application, and data teams to deliver integrated cloud solutions
Strong communication skills to engage stakeholders at various levels, from engineering to executives
Qualifications
15+ years of hands-on experience in Platform or Data Engineering, Cloud Architecture, Multi-Cloud Multi-Region Deployment & Architecture, AI Engineering roles
Strong programming background in Java, Python, SQL, and one or more general-purpose languages
Deep knowledge of data modeling, distributed systems, and API design in production environments
Proficiency in designing and managing Kubernetes, serverless workloads, and streaming systems (Kafka, Pub/Sub, Flink, Spark)
Experience with metadata management, data catalogs, data quality enforcement, and semantic modeling & automated integration with Data Platform
Proven experience building scalable, efficient data pipelines for structured and unstructured data
Experience with GenAI/LLM frameworks and tools for orchestration and workflow automation
Experience with RAG pipelines, vector databases, and embedding-based search
Familiarity with observability tools (Prometheus, Grafana, OpenTelemetry) and strong debugging skills across the stack
Experience with ML Platforms (MLFlow, Vertex AI, Kubeflow) and AI/ML observability tools
Prior implementation of data mesh or data fabric in a large-scale enterprise
Experience with Looker Modeler, LookML, or semantic modeling layers
Preferred Certifications
AWS Certified Solutions Architect – Professional
Google Professional Cloud Architect
Microsoft Certified: Azure Solutions Architect Expert
HashiCorp Certified: Terraform Associate
Other relevant certifications (CKA, CKS, CISSP cloud concentration) are a plus.
Why You’ll Love This Role
Drive technical leadership across AI-native data platforms, automation systems, and self-service tools
Collaborate across teams to shape the next generation of intelligent platforms in the enterprise
Work with a high-energy, mission-driven team that embraces innovation, open-source, and experimentation
Equinix is committed to ensuring that our employment process is open to all individuals, including those with a disability. If you are a qualified candidate and need assistance or an accommodation, please let us know by completing this form.
Equinix is an Equal Employment Opportunity and, in the U.S., an Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to unlawful consideration of race, color, religion, creed, national or ethnic origin, ancestry, place of birth, citizenship, sex, pregnancy / childbirth or related medical conditions, sexual orientation, gender identity or expression, marital or domestic partnership status, age, veteran or military status, physical or mental disability, medical condition, genetic information, political / organizational affiliation, status as a victim or family member of a victim of crime or abuse, or any other status protected by applicable law.
Seniority level
Seniority level Mid-Senior level
Employment type
Employment type Full-time
Job function
Job function Engineering and Information Technology
Industries Internet Publishing
Referrals increase your chances of interviewing at Equinix by 2x
Information Technology - Cloud/DevOps Engineer
Junior Devops Engineer (salary displayed)
DevOps Engineering, Engineer (1 year contract)
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr
Be The First To Know
About the latest Cloud data engineer Jobs in Singapore !
Data Engineer(Cloud Migration)
Posted today
Job Viewed
Job Description
Years of Experience At least 5 - 6 years' experience in developing, implementing and maintaining IT systems
The Cloud Engineer is responsible for designing, deploying, and managing cloud infrastructure. This role includes performing cloud system monitoring and maintenance, troubleshooting and resolving cloud(1)related issues, and documenting cloud processes and procedures.
Cloud Infrastructure Design and Development:
• Design and develop cloud infrastructure that meets business requirements.
• Create and maintain cloud resources and configurations.
• Ensure cloud resource integrity and consistency across environments.Implementation and Maintenance
• Implement, configure, and upgrade cloud services and related software.
• Perform regular cloud maintenance tasks, such as backups, patching, and updates.
• Monitor cloud performance and implement improvements as needed.
Performance Tuning and Optimization
• Analyze and optimize cloud performance, including resource tuning and configuration.
• Monitor cloud performance metrics and address potential issues proactively.
• Develop and implement strategies for cloud scalability and capacity planning.
Security and Compliance
• Implement and manage cloud security measures, including access controls and encryption.
• Ensure cloud services comply with relevant industry standards and regulations.
• Conduct regular security assessments and vulnerability testing.
Collaboration and Communication
• Work closely with software developers to ensure seamless integration of cloud services with applications.
• Communicate complex technical concepts to non-technical stakeholders.
• Participate in cross-functional project teams and contribute to project planning and execution.
Troubleshooting and Support
• Diagnose and resolve cloud issues, including performance bottlenecks and connectivity problems.
• Provide on-call support for critical cloud services as needed.
• Maintain comprehensive documentation for cloud services, processes, and proceduBachelor's or master's degree in computer science, data engineering, or a related field.
• Minimum 7 years of experience in data engineering, with expertise in AWS services, Databricks, and/or Informatica IDMC.
• Proficiency in programming languages such as Python, Java, or Scala for building data pipelines.
• Evaluate potential technical solutions and make recommendations to resolve data issues especially on performance assessment for complex data transformations and long running data processes.
• Strong knowledge of SQL and NoSQL databases.
• Familiarity with data modeling and schema design.
• Excellent problem-solving and analytical skills.
• Strong communication and collaboration skills.
• AWS certifications (e.g., AWS Certified Data Analytics - Specialty, AWS Certified Data Analytics - Specialty), Databricks certifications, and Informatica certifications are a plus
Factory
Scala
Analytical Skills
Azure
Data Modeling
Pipelines
Informatica
ETL
Data Engineering
SQL
Python
Java
Data Analytics
S3
Databases
Virtualization
Data Engineer(Cloud Migration)
Posted 1 day ago
Job Viewed
Job Description
Years of Experience At least 5 - 6 years’ experience in developing, implementing and maintaining IT systems
The Cloud Engineer is responsible for designing, deploying, and managing cloud infrastructure. This role includes performing cloud system monitoring and maintenance, troubleshooting and resolving cloud(1)related issues, and documenting cloud processes and procedures.
Cloud Infrastructure Design and Development:
• Design and develop cloud infrastructure that meets business requirements.
• Create and maintain cloud resources and configurations.
• Ensure cloud resource integrity and consistency across environments.Implementation and Maintenance
• Implement, configure, and upgrade cloud services and related software.
• Perform regular cloud maintenance tasks, such as backups, patching, and updates.
• Monitor cloud performance and implement improvements as needed.
Performance Tuning and Optimization
• Analyze and optimize cloud performance, including resource tuning and configuration.
• Monitor cloud performance metrics and address potential issues proactively.
• Develop and implement strategies for cloud scalability and capacity planning.
Security and Compliance
• Implement and manage cloud security measures, including access controls and encryption.
• Ensure cloud services comply with relevant industry standards and regulations.
• Conduct regular security assessments and vulnerability testing.
Collaboration and Communication
• Work closely with software developers to ensure seamless integration of cloud services with applications.
• Communicate complex technical concepts to non-technical stakeholders.
• Participate in cross-functional project teams and contribute to project planning and execution.
Troubleshooting and Support
• Diagnose and resolve cloud issues, including performance bottlenecks and connectivity problems.
• Provide on-call support for critical cloud services as needed.
• Maintain comprehensive documentation for cloud services, processes, and proceduBachelor’s or master’s degree in computer science, data engineering, or a related field.
• Minimum 7 years of experience in data engineering, with expertise in AWS services, Databricks, and/or Informatica IDMC.
• Proficiency in programming languages such as Python, Java, or Scala for building data pipelines.
• Evaluate potential technical solutions and make recommendations to resolve data issues especially on performance assessment for complex data transformations and long running data processes.
• Strong knowledge of SQL and NoSQL databases.
• Familiarity with data modeling and schema design.
• Excellent problem-solving and analytical skills.
• Strong communication and collaboration skills.
• AWS certifications (e.g., AWS Certified Data Analytics - Specialty, AWS Certified Data Analytics - Specialty), Databricks certifications, and Informatica certifications are a plus
Data Engineer (Cloud Migration)
Posted 8 days ago
Job Viewed
Job Description
POSITION OVERVIEW : Industry Consulting Consultant
POSITION GENERAL DUTIES AND TASKS :
Roles And Responsibilities:
• Design and architect data storage solutions, including databases, data lakes, and warehouses, using AWS services such as Amazon S3, Amazon RDS, Amazon Redshift, and Amazon DynamoDB, along with Databricks' Delta Lake. Integrate Informatica IDMC for metadata management and data cataloging.
• Create, manage, and optimize data pipelines for ingesting, processing, and transforming data using AWS services like AWS Glue, AWS Data Pipeline, and AWS Lambda, Databricks for advanced data processing, and Informatica IDMC for data integration and quality.• Integrate data from various sources, both internal and external, into AWS and Databricks environments, ensuring data consistency and quality, while leveraging Informatica IDMC for data integration, transformation, and governance.
• Develop ETL (Extract, Transform, Load) processes to cleanse, transform, and enrich data, making it suitable for analytical purposes using Databricks' Spark capabilities and Informatica IDMC for data transformation and quality.
• Monitor and optimize data processing and query performance in both AWS and Databricks environments, making necessary adjustments to meet performance and scalability requirements. Utilize Informatica IDMC for optimizing data workflows.
• Implement security best practices and data encryption methods to protect sensitive data in both AWS and Databricks, while ensuring compliance with data privacy regulations. Employ Informatica IDMC for data governance and compliance.
• Implement automation for routine tasks, such as data ingestion, transformation, and monitoring, using AWS services like AWS Step Functions, AWS Lambda, Databricks Jobs, and Informatica IDMC for workflow automation.
• Maintain clear and comprehensive documentation of data infrastructure, pipelines, and configurations in both AWS and Databricks environments, with metadata management facilitated by Informatica IDMC.
• Collaborate with cross-functional teams, including data scientists, analysts, and software engineers, to understand data requirements and deliver appropriate solutions across AWS, Databricks, and Informatica IDMC.
• Identify and resolve data-related issues and provide support to ensure data availability and integrity in both AWS, Databricks, and Informatica IDMC environments.
• Optimize AWS, Databricks, and Informatica resource usage to control costs while meeting performance and scalability requirements.
• Stay up-to-date with AWS, Databricks, Informatica IDMC services, and data engineering best practices to recommend and implement new technologies and techniques.
Requirements / Qualifications
• Bachelor’s or master’s degree in computer science, data engineering, or a related field.
• Minimum 7 years of experience in data engineering, with expertise in AWS services, Databricks, and/or Informatica IDMC.• Proficiency in programming languages such as Python, Java, or Scala for building data pipelines.
• Evaluate potential technical solutions and make recommendations to resolve data issues especially on performance assessment for complex data transformations and long running data processes.
• Strong knowledge of SQL and NoSQL databases.
• Familiarity with data modeling and schema design.
• Excellent problem-solving and analytical skills.
• Strong communication and collaboration skills.
• AWS certifications (e.g., AWS Certified Data Analytics - Specialty, AWS Certified Data Analytics - Specialty), Databricks certifications, and Informatica certifications are a plus.
Preferred Skills:
• Experience with big data technologies like Apache Spark and Hadoop on Databricks.
• Knowledge of data governance and data cataloguing tools, especially Informatica IDMC.• Familiarity with data visualization tools like Tableau or Power BI.
• Knowledge of containerization and orchestration tools like Docker and Kubernetes.
• Understanding of DevOps principles for managing and deploying data pipelines.
• Experience with version control systems (e.g., Git) and CI/CD pipelines.