- 版本号:3.0.1,官方链接:传送门
- Subpackages
pyspark.sql module
pyspark.streaming module
pyspark.ml package
pyspark.mllib package
pyspark.resource module - PySpark is the Python API for Spark,Public classes:
SparkContext:Spark功能的主要入口点。
RDD:A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.
Broadcast:A broadcast variable that gets reused across tasks.
Accumulator:An “add-only” shared variable that tasks can only add values to.
SparkConf:For configuring Spark.
SparkFiles:Access files shipped with jobs.
StorageLevel:Finer-grained cache persistence levels.
TaskContext:Information about the current running task, available on the workers and experimental.
RDDBarrier:Wraps an RDD under a barrier stage for barrier execution.
BarrierTaskContext:A TaskContext that provides extra info and tooling for barrier execution.
BarrierTaskInfo:Information about a barrier task.
Sparkconf
