macOS 에서 pyspark SparkContext 생성 시 java.net.BindException 에러

macOS 에서 pyspark SparkContext 생성 시 java.net.BindException 에러

macOS 에서 Jupyter 노트북을 통해 로컬 pyspark 의 SparkContext 생성시 아래와 같이 java.net.BindException 에러가 나는 경우가 있습니다.

import os
from pyspark.sql import SparkSession

spark_home = os.environ.get('SPARK_HOME', None)
spark = SparkSession.builder.master("local[*]").appName("spark")
spark = spark.config("spark.driver.memory", "8g")
spark = spark.config("spark.executor.memory", "8g")
spark = spark.config("spark.python.worker.memory", "8g")
spark = spark.getOrCreate()

sc = spark.sparkContext
ERROR SparkContext: Error initializing SparkContext.
java.net.BindException: Can't assign requested address: Service 'sparkDriver' failed after 16 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'sparkDriver' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address.
    at java.base/sun.nio.ch.Net.bind0(Native Method)
    at java.base/sun.nio.ch.Net.bind(Net.java:469)
    at java.base/sun.nio.ch.Net.bind(Net.java:458)
    at java.base/sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:220)
    at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:132)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:551)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1346)
    at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:503)
    at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:488)
    at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:985)
    at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:247)
    at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:344)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:510)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:518)
    at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:830)

이는 macOS 의 hostname 으로 localhost 를 찾지 못해 일어나는 현상으로 SPARK_LOCAL_IP 라는 환경 변수를 설정해주어 해결할 수 있습니다.

이를 설정하는 방법은 아래와 같이 python 코드상에서 환경 설정 변수를 지정하는 방법

os.environ["SPARK_LOCAL_IP"] = "127.0.0.1"

shell 의 환경 변수에 설정하는 방법

export SPARK_LOCAL_IP="127.0.0.1"

또는 spark의 환경 설정 파일인 spark-env.sh 파일에 설정하는 방법이 있습니다.

> vi $SPARK_HOME/conf/spark-env.sh

...
# Options read by executors and drivers running inside the cluster
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
# - MESOS_NATIVE_JAVA_LIBRARY, to point to your libmesos.so if you use Mesos

SPARK_LOCAL_IP="127.0.0.1"

Comments

Comments powered by Disqus