Tuesday, July 16, 2024

How to fix Lazy evaluation overhead

Technic to handle Lazy evaluation overhead 

1. Caching and Persisting RDDs

2. Using Broadcast Variables 

3. Avoiding Operations that Cause Shuffling : Operations like groupByKey and reduceByKey 

4. Using the right data structures**: DataFrames and Datasets 

5. Tuning Spark configurations: auto scale confgs

6. Checkpointing


No comments: