spark使用jdbc connector连接数据库

使用spark-shell测试

1
SPARK_CLASSPATH=/usr/share/java/mysql-connector-java.jar spark-shell

使用 --driver-class-path 参数在进行计算时会报错, Did not find registered driver with class com.mysql.jdbc.Driver 。猜测是由于yarn没有默认加载mysql-connector-java.jar造成的。

创建DataFrame

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
scala> val df = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:mysql://db_host/?user=secret&password=secret","dbtable" -> "db.table")).load()
df: org.apache.spark.sql.DataFrame = [id: string, name: string, age: int]
scala> df.printSchema
root
|-- id: string (nullable = false)
|-- name: string (nullable = true)
|-- age: integer (nullable = true)
scala> df.count
res0: Long = 7
scala> df.registerTempTable("people")
scala> sqlContext.sql("select count(*) from people").collect
res2: Array[org.apache.spark.sql.Row] = Array([7])
scala> sqlContext.sql("select name, age from people").collect
res3: Array[org.apache.spark.sql.Row] = Array([name,32], [name,32], [name,32], [name,32], [name,32], [name,32], [name,32])

参考

  1. spark1.6官方文档
  2. stackoverflow: 如何使用spark连接postgresql
  3. Spark SQL官方文档-中文翻译