Company-wise Questions
Real interview questions from 25+ companies
BigQuery, Dataflow, Beam, Pub/Sub, Bigtable, GCP pipelines
Amazon
S3, Glue, Athena, Kinesis, Redshift, EMR, Lake Formation
Microsoft
Azure ADF, Synapse, Databricks, Spark, managed identity, DR
Apple
Spark vs MapReduce, Parquet, SQL, binary trees, ETL optimization
Meta
Product metrics, SQL, data modeling, A/B testing, streaming pipelines
Netflix
Streaming schema, recommendations, SQL, Spark, reservoir sampling
NVIDIA
GPU telemetry, Kafka, CDC, data quality, governance
Tesla
Vehicle telemetry, SQL, algorithms, shortest path, manufacturing QA
Walmart
Retail DWH, POS streaming, SCD, compliance, Spark at scale
Atlassian
Event pipelines, SQL attribution, graph modeling, Jira analytics
JP Morgan Chase
Spark, SQL, distributed systems, system design
Goldman Sachs
SQL, Java, algorithms, data modeling, Spark
HSBC
Real-time pipelines, GDPR, Spark, Azure, SQL
American Express
SQL, Python, Spark, data modeling
Flipkart
E-commerce data modeling, Spark, SQL, pipelines
Deloitte
Azure, ADF, Databricks, Delta Lake, PySpark
PwC
Azure ADF, Delta Lake, Spark, SQL, CI/CD
KPMG
Databricks, ADF, ADLS, SQL, PySpark, Python
EY
SQL, PySpark, cloud, data engineering
Infosys
Azure, Databricks, ADF, Delta Lake, SQL
TCS
GCP, Azure, Python, Spark, SQL
Wipro
Catalyst optimizer, AQE, skew joins, Data Vault, CDC vs full refresh
Cognizant
PySpark, ADLS, Azure SQL, Spark, SQL
Capgemini
Delta Lake, SCD, Data Mesh, Parquet, Lambda vs Kappa, cost optimization
EPAM
Spark internals, Delta Lake, GCP, SQL
Publicis Sapient
System design, SQL, clickstream analytics
Persistent
PySpark, ADF, Synapse, SQL, Azure DevOps
Virtusa
Lakehouse, CDC, schema drift, window functions, columnar storage
Coforge
ETL vs ELT, surrogate keys, PolyBase, Medallion, event-driven ingestion
Amdocs
Telecom CDR pipelines, CDC in Azure, Delta Lake, ADF error handling
Accenture
Cloud pipelines, SCD types, DLT, Unity Catalog, CI/CD for data
Tata Digital
SQL, PySpark, GCP, BigQuery, Airflow
Tredence
Spark, Airflow, SQL, Python, system design
Fractal
GCP, Azure, PySpark, Airflow, Databricks
Quantiphi
GCP, BigQuery, Spark, ML pipelines
Nielsen
GCP, Spark, SQL, Python, Airflow
Latentview
ADF, ADLS, Synapse, Azure DevOps, Python
EXL
SparkSession, ADLS, Stream Analytics, SQL
Synechron
Spark execution, Airflow, Hive, SQL
Datametica
PySpark, SQL, nested JSON, ETL
Saama Technology
PySpark, Spark optimization, SQL
HCL
PySpark coding, deduplication, aggregations
NTT DATA
SQL, Spark, Hadoop, Python, ETL fundamentals
AppZen
SQL: self-joins, running totals, deduplication
66 Degree
Python, Spark, BigQuery, GCP
AP Moller - Maersk
PySpark, SQL, data pipelines
Deutsche Bank
SQL, Spark, Python, data modeling