Technic to handle Lazy evaluation overhead
1. Caching and Persisting RDDs
2. Using Broadcast Variables
3. Avoiding Operations that Cause Shuffling : Operations like groupByKey and reduceByKey
4. Using the right data structures**: DataFrames and Datasets
5. Tuning Spark configurations: auto scale confgs
6. Checkpointing
No comments:
Post a Comment