If you are dealing with relatively low datasets < 1M entries (and you just have to use Spark for some reasons), significant speedup can be achieved with tuning (lowering) number of partitions.
Basically, setting `spark.default.parallelism` param to number of cores and `spark.sql.shuffle.partitions` to something like 20 (instead of default 200), will allow you to receive significant speedup, since Spark won’t lose time on shuffling RDDs and generating large number of tasks.