- 读取csv格式的数据,将对对应文本经纬度转成geometry类型的数据进行处理
- 数据类型为CSV(带有表头信息)
- 2.测试代码如下:
import findspark
findspark.init()
from geospark.utils import KryoSerializer, GeoSparkKryoRegistrator
from geospark.core.SpatialRDD import PointRDD
from geospark.core.enums import FileDataSplitter
from pyspark.sql import SparkSession
from geospark.register import GeoSparkRegistrator
from geospark.utils.adapter import Adapter
from pyspark import SparkConf,SparkContext
from pyspark.sql.functions import *
spark = SparkSession.builder\
.config("spark.serializer", KryoSerializer.getName)\
.config("spark.kryo.registrator", GeoSparkKryoRegistrator.getName).\
getOrCreate()
GeoSparkRegistrator.registerAll(spark)
inputLocation = r"D:\pycharm\pythonProject\GeoSpark\functions\taxi\taxi\taxi.csv"
df = spark.read.format("csv").option("header","true").load(inputLocation)
df.createOrReplaceTempView("view")
textSpatialDf = spark.sql("""
select *,'Point('||view.pickup_longitude||' '||view.pickup_latitude||')' as geom from view
""")
# df2.select("geom").show(truncate=False)
textSpatialDf.createOrReplaceTempView("textView")
spatialDf = spark.sql("""
select *,ST_PointFromText(textView.geom,'WKT') as geometry from textView
""")
spatialDf.show()
spatialDf.printSchema()
- 测试结果如下:
+---------+-------------------+-------------------+---------------+-------------------+-------------------+------------------+---------+------------------+-------------------+--------
----------+------------+-----------+---------+-------+------------------+------------------+------------------+--------------------+--------------------+
|vendor_id| pickup_datetime| dropoff_datetime|passenger_count| trip_distance| pickup_longitude| pickup_latitude|rate_code|store_and_fwd_flag| dropoff_longitude| dropof
f_latitude|payment_type|fare_amount|surcharge|mta_tax| tip_amount| tolls_amount| total_amount| geom| geometry|
+---------+-------------------+-------------------+---------------+-------------------+-------------------+------------------+---------+------------------+-------------------+--------
----------+------------+-----------+---------+-------+------------------+------------------+------------------+--------------------+--------------------+
| CMT|2014-01-09 20:45:25|2014-01-09 20:52:31| 1|0.69999999999999996|-73.994770000000003|40.736828000000003| 1| N|-73.982226999999995|40.73178
9999999997| CRD| 6.5| 0.5| 0.5|1.3999999999999999| 0|8.9000000000000004|Point(-73.9947700...|POINT (-73.99477 ...|
| CMT|2014-01-09 20:46:12|2014-01-09 20:55:12| 1| 1.3999999999999999|-73.982392000000004|40.773381999999998| 1| N|-73.960448999999997|40.76399
5000000001| CRD| 8.5| 0.5| 0.5|1.8999999999999999| 0| 11.4|Point(-73.9823920...|POINT (-73.982392...|
| CMT|2014-01-09 20:44:47|2014-01-09 20:59:46| 2| 2.2999999999999998|-73.988569999999996|40.739406000000002| 1| N|-73.986626000000001|
40.765217| CRD| 11.5| 0.5| 0.5| 1.5| 0| 14|Point(-73.9885699...|POINT (-73.98857 ...|
| CMT|2014-01-09 20:44:57|2014-01-09 20:51:40| 1| 1.7|-73.960212999999996|40.770463999999997| 1| N|-73.979862999999995|40.77705
0000000003| CRD| 7.5| 0.5| 0.5| 1.7| 0|10.199999999999999|Point(-73.9602129...|POINT (-73.960213...|
| CMT|2014-01-09 20:47:09|2014-01-09 20:53:32| 1|0.90000000000000002|-73.995371000000006|40.717247999999998| 1| N|-73.984367000000006|40.72052
3999999997| CRD| 6| 0.5| 0.5| 1.75| 0| 8.75|Point(-73.9953710...|POINT (-73.995371...|
| CMT|2014-01-09 20:45:07|2014-01-09 20:51:01| 1|0.90000000000000002|-73.983811000000003|40.749654999999997| 1| N|-73.989746999999994|40.75657
4999999998| CRD| 6| 0.5| 0.5|1.3999999999999999| 0|8.4000000000000004|Point(-73.9838110...|POINT (-73.983811...|
| CMT|2014-01-09 20:44:04|2014-01-09 21:05:45| 1| 3.6000000000000001|-73.984138000000002|40.726317000000002| 1| N|-73.962868999999998|
40.758443| CRD| 16.5| 0.5| 0.5| 5.25| 0| 22.75|Point(-73.9841380...|POINT (-73.984138...|
| CMT|2014-01-09 20:43:23|2014-01-09 20:52:07| 1| 2.1000000000000001| -73.979906|40.745849999999997| 1| N|-73.959090000000003|40.77363
9000000003| CRD| 9| 0.5| 0.5| 2| 0| 12|Point(-73.979906 ...|POINT (-73.979906...|
| CMT|2014-01-09 20:43:04|2014-01-09 20:54:29| 1| 3.3999999999999999|-73.981147000000007|40.758918000000001| 1| N|-73.942509999999999|40.78597
5000000001| CRD| 12| 0.5| 0.5|2.6000000000000001| 0| 15.6|Point(-73.9811470...|POINT (-73.981147...|
| CMT|2014-01-09 20:50:23|2014-01-09 20:58:10| 1| 2.2999999999999998|-73.955192999999994|40.765467999999998| 1| N|-73.979022999999998|40.74057
7999999999| CRD| 9| 0.5| 0.5| 1| 0| 11|Point(-73.9551929...|POINT (-73.955193...|
| CMT|2014-01-09 20:51:36|2014-01-09 21:15:07| 1| 9.5|-73.885274999999993|40.773048000000003| 1| N|-73.980879000000002|40.77738
3999999998| CRD| 28.5| 0.5| 0.5| 6.96|5.3300000000000001|41.789999999999999|Point(-73.8852749...|POINT (-73.885275...|
| CMT|2014-01-09 20:48:04|2014-01-09 21:01:37| 1| 3.2999999999999998|-73.991782000000001| 40.748911| 1| N|-73.988359000000003|
40.714205| CRD| 12.5| 0.5| 0.5|4.0499999999999998| 0|17.550000000000001|Point(-73.9917820...|POINT (-73.991782...|
| CMT|2014-01-09 20:47:49|2014-01-09 20:56:11| 2| 1.8|-73.965716999999998|40.758674999999997| 1| N|-73.984059000000002|40.73744
8000000001| CRD| 8.5| 0.5| 0.5|1.8999999999999999| 0| 11.4|Point(-73.9657169...|POINT (-73.965717...|
| CMT|2014-01-09 20:48:47|2014-01-09 20:56:52| 2| 1.3999999999999999|-73.977008999999995|40.751620000000003| 1| N|-73.982642999999996|40.76657
3999999999| CRD| 7.5| 0.5| 0.5| 1.7| 0|10.199999999999999|Point(-73.9770089...|POINT (-73.977009...|
| CMT|2014-01-09 20:47:51|2014-01-09 21:02:31| 3| 2.6000000000000001|-73.977655999999996|40.753680000000003| 1| N|-73.952248999999995|
40.777676| CRD| 12.5| 0.5| 0.5| 1| 0| 14.5|Point(-73.9776559...|POINT (-73.977656...|
| CMT|2014-01-09 20:49:49|2014-01-09 21:20:38| 1| 11.199999999999999|-73.788265999999993|40.647542000000001| 1| N|-73.949224999999998|40.65270
0000000003| CRD| 35.5| 0.5| 0.5| 0| 0| 36.5|Point(-73.7882659...|POINT (-73.788266...|
| CMT|2014-01-09 16:51:35|2014-01-09 17:00:17| 1| 1.7| -74.007503|40.725991999999998| 1| N|-73.988181999999995|40.73458
3000000001| CRD| 8.5| 1| 0.5| 2| 0| 12|Point(-74.007503 ...|POINT (-74.007503...|
| CMT|2014-01-09 16:43:29|2014-01-09 16:59:15| 1| 4.7000000000000002|-74.014865999999998| 40.709353| 1| N|-73.986084000000005|40.75908
1000000002| CRD| 16| 1| 0.5| 4| 0| 21.5|Point(-74.0148659...|POINT (-74.014866...|
| CMT|2014-01-09 16:46:50|2014-01-09 16:56:41| 1| 1.6000000000000001| -73.967675| 40.763109| 1| N|-73.952590999999998|40.77818
5999999998| CRD| 9| 1| 0.5|2.1000000000000001| 0| 12.6|Point(-73.967675 ...|POINT (-73.967675...|
| CMT|2014-01-09 16:47:00|2014-01-09 17:37:58| 1| 17.899999999999999|-73.781730999999994|40.644728999999998| 2| N|-73.978604000000004|40.76182
2000000002| CRD| 52| 0| 0.5| 11.56|5.3300000000000001|69.390000000000001|Point(-73.7817309...|POINT (-73.781731...|
+---------+-------------------+-------------------+---------------+-------------------+-------------------+------------------+---------+------------------+-------------------+--------
----------+------------+-----------+---------+-------+------------------+------------------+------------------+--------------------+--------------------+
only showing top 20 rows
root
|-- vendor_id: string (nullable = true)
|-- pickup_datetime: string (nullable = true)
|-- dropoff_datetime: string (nullable = true)
|-- passenger_count: string (nullable = true)
|-- trip_distance: string (nullable = true)
|-- pickup_longitude: string (nullable = true)
|-- pickup_latitude: string (nullable = true)
|-- rate_code: string (nullable = true)
|-- store_and_fwd_flag: string (nullable = true)
|-- dropoff_longitude: string (nullable = true)
|-- dropoff_latitude: string (nullable = true)
|-- payment_type: string (nullable = true)
|-- fare_amount: string (nullable = true)
|-- surcharge: string (nullable = true)
|-- mta_tax: string (nullable = true)
|-- tip_amount: string (nullable = true)
|-- tolls_amount: string (nullable = true)
|-- total_amount: string (nullable = true)
|-- geom: string (nullable = true)
|-- geometry: geometry (nullable = false)
当然其他方法也是可以转的;