$ bin/solr -e cloud
Welcome to the SolrCloud example!
This interactive session will help you launch a SolrCloud cluster on your local workstation.
To begin, how many Solr nodes would you like to run in your local cluster? (specify 1-4 nodes) [2]:
2
Ok, let's start up 2 Solr nodes for your example SolrCloud cluster.
Please enter the port for node1 [8983]:
8983
Please enter the port for node2 [7574]:
7574
Creating Solr home directory /home/ec2-user/solr-6.3.0/example/cloud/node1/solr
Cloning /home/ec2-user/solr-6.3.0/example/cloud/node1 into
/home/ec2-user/solr-6.3.0/example/cloud/node2
Starting up Solr on port 8983 using command:
bin/solr start -cloud -p 8983 -s "example/cloud/node1/solr"
Waiting up to 180 seconds to see Solr running on port 8983 [|]
Started Solr server on port 8983 (pid=8625). Happy searching!
Starting up Solr on port 7574 using command:
bin/solr start -cloud -p 7574 -s "example/cloud/node2/solr" -z localhost:9983
Waiting up to 180 seconds to see Solr running on port 7574 []
Started Solr server on port 7574 (pid=8840). Happy searching!
Now let's create a new collection for indexing documents in your 2-node cluster.
Please provide a name for your new collection: [gettingstarted]
gettingstarted
How many shards would you like to split gettingstarted into? [2]
2
How many replicas per shard would you like to create? [2]
2
Please choose a configuration for the gettingstarted collection, available options are:
basic_configs, data_driven_schema_configs, or sample_techproducts_configs [data_driven_schema_configs]
basic_configs
Connecting to ZooKeeper at localhost:9983 ...
Uploading /home/ec2-user/solr-6.3.0/server/solr/configsets/basic_configs/conf for config gettingstarted to ZooKeeper at localhost:9983
Creating new collection 'gettingstarted' using command:
$ tar xvf spark-2.0.2-bin-hadoop2.7.tgz
spark-2.0.2-bin-hadoop2.7/
spark-2.0.2-bin-hadoop2.7/NOTICE
spark-2.0.2-bin-hadoop2.7/jars/
〜略〜
park-2.0.2-bin-hadoop2.7/yarn/
spark-2.0.2-bin-hadoop2.7/yarn/spark-2.0.2-yarn-shuffle.jar
spark-2.0.2-bin-hadoop2.7/README.md
$ cd spark-2.0.2-bin-hadoop2.7
■ Pythonでホゲホゲ
$ bin/pyspark
Python 2.7.12 (default, Sep 1 2016, 22:14:00)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/12/19 06:24:10 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/__ / .__/_,_/_/ /_/_ version 2.0.2
/_/
Using Python version 2.7.12 (default, Sep 1 2016 22:14:00)
SparkSession available as 'spark'.
>>> textFile = sc.textFile("README.md")
>>> textFile.count()
99
>>> textFile.first()
u'# Apache Spark'
spark-solrのGetting Started
■ spark-solrのパッケージをインストール → ★失敗★
$ bin/spark-shell --packages "com.lucidworks.spark:spark-solr:3.0.0-alpha"
Ivy Default Cache set to: /home/ec2-user/.ivy2/cache
The jars for the packages stored in: /home/ec2-user/.ivy2/jars
:: loading settings :: url = jar:file:/home/ec2-user/spark-2.0.2-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.lucidworks.spark#spark-solr added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.lucidworks.spark#spark-solr;3.0.0-alpha in central
〜略〜
==== spark-packages: tried
http://dl.bintray.com/spark-packages/maven/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.pom
-- artifact org.restlet.jee#org.restlet.ext.servlet;2.3.0!org.restlet.ext.servlet.jar:
http://dl.bintray.com/spark-packages/maven/org/restlet/jee/org.restlet.ext.servlet/2.3.0/org.restlet.ext.servlet-2.3.0.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: org.restlet.jee#org.restlet;2.3.0: not found
:: org.restlet.jee#org.restlet.ext.servlet;2.3.0: not found
::::::::::::::::::::::::::::::::::::::::::::::
$ echo $?
1
$ bin/spark-shell --packages "com.lucidworks.spark:spark-solr:3.0.0-alpha"
Ivy Default Cache set to: /home/ec2-user/.ivy2/cache
The jars for the packages stored in: /home/ec2-user/.ivy2/jars
:: loading settings :: url = jar:file:/home/ec2-user/spark-2.0.2-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.lucidworks.spark#spark-solr added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.lucidworks.spark#spark-solr;3.0.0-alpha in central
〜略〜
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 173 | 2 | 2 | 24 || 149 | 1 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
149 artifacts copied, 0 already retrieved (79674kB/104ms)
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
2016-12-19 07:36:40,971 [main] WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-12-19 07:36:41,955 [main] WARN SparkContext - Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://172.31.14.210:4040
Spark context available as 'sc' (master = local[*], app id = local-1482133001755).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 2.0.2
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val df = spark.read.format("solr").options(options).load
2016-12-19 07:47:26,557 [main] WARN ObjectStore - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
2016-12-19 07:47:26,651 [main] WARN ObjectStore - Failed to get database default, returning NoSuchObjectException
2016-12-19 07:47:27,479 [main] ERROR RetryingHMSHandler - AlreadyExistsException(message:Database default already exists)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:891)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
〜略〜
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
df: org.apache.spark.sql.DataFrame = [*_ancestor_path: string, *_txt_hi: string ... 72 more fields]
※ 一旦、そのまま進んでみます!
ってことで今動いているSolrをシャットダウンします。
$ bin/solr stop -all
Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds to allow Jetty process 8625 to stop gracefully.
Sending stop command to Solr running on port 7574 ... waiting up to 180 seconds to allow Jetty process 8840 to stop gracefully.
$ bin/solr -c && bin/solr create -c test-spark-solr -shards 2
Waiting up to 180 seconds to see Solr running on port 8983 []
Started Solr server on port 8983 (pid=32395). Happy searching!
Connecting to ZooKeeper at localhost:9983 ...
Uploading /home/ec2-user/solr-6.3.0/server/solr/configsets/data_driven_schema_configs/conf for config test-spark-solr to ZooKeeper at localhost:9983
Creating new collection 'test-spark-solr' using command:
scala> import org.apache.spark.SparkContext
import org.apache.spark.SparkContex
scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext
scala> val conf = new SparkConf().setAppName("Spark SQL from MySQL").setMaster("local[*]")
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@1eb207c3
scala> val sc = new SparkContext(conf)
org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
〜略〜
at org.apache.spark.SparkContext.<init>(SparkContext.scala:86)
... 48 elided
scala> val sqlContext = new SQLContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@1cd853ee
val conf = new SparkConf().setAppName("Spark SQL from MySQL").setMaster("local[*]")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
scala> sqlContext.sql("SELECT avg(tip_amount), avg(fare_amount) FROM trips").show()
+------------------+------------------+
| avg(tip_amount)| avg(fare_amount)|
+------------------+------------------+
|1.5674853801169588|11.802144249512668|
+------------------+------------------+
scala> sqlContext.sql("SELECT max(tip_amount), max(fare_amount) FROM trips WHERE trip_distance > 10").show()
+---------------+----------------+
|max(tip_amount)|max(fare_amount)|
+---------------+----------------+
| 16.44| 68.0|
+---------------+----------------+
コメント