How to join more than 2 regions with Apache Geode? - join

I've been trying to query some regions and i'm failing to join more than 2 of them. I set that up in a Java test to run them more easily but it fails all the same in pulse.
#Test
public void test_geode_join() throws QueryException {
ClientCache cache = new ClientCacheFactory()
.addPoolLocator(HOST, LOCATOR_PORT)
.setPoolSubscriptionEnabled(true)
.setPdxSerializer(new MyReflectionBasedAutoSerializer())
.create();
{
#SuppressWarnings("unchecked")
SelectResults<StructImpl> r = (SelectResults<StructImpl>) cache.getQueryService()
.newQuery("SELECT itm.itemId, bx.boxId " +
"FROM /items itm, /boxs bx " +
"WHERE itm.boxId = bx.boxId " +
"LIMIT 5")
.execute();
for (StructImpl v : r) {
System.out.println(v);
}
}
{
#SuppressWarnings("unchecked")
SelectResults<StructImpl> r = (SelectResults<StructImpl>) cache.getQueryService()
.newQuery("SELECT bx.boxId, rm.roomId " +
"FROM /boxs bx, /rooms rm " +
"WHERE bx.roomId = rm.roomId " +
"LIMIT 5")
.execute();
for (StructImpl v : r) {
System.out.println(v);
}
}
{
// That fails
#SuppressWarnings("unchecked")
SelectResults<StructImpl> r = (SelectResults<StructImpl>) cache.getQueryService()
.newQuery("SELECT itm.itemId, bx.boxId, rm.roomId " +
"FROM /items itm, /boxs bx, /rooms rm " +
"WHERE itm.boxId = bx.boxId " +
"AND bx.roomId = rm.roomId " +
"LIMIT 5")
.execute();
for (StructImpl v : r) {
System.out.println(v);
}
}
}
The first 2 queries work fine and respond in an instant but the last query holds until it timeouts. I get the following logs
[warn 2018/02/06 17:33:17.155 CET <main> tid=0x1] Pool unexpected socket timed out on client connection=Pooled Connection to hostname:31902: Connection[hostname:31902]#1978504976)
[warn 2018/02/06 17:33:27.333 CET <main> tid=0x1] Pool unexpected socket timed out on client connection=Pooled Connection to hostname2:31902: Connection[hostname2:31902]#1620459733 attempt=2)
[warn 2018/02/06 17:33:37.588 CET <main> tid=0x1] Pool unexpected socket timed out on client connection=Pooled Connection to hostname3:31902: Connection[hostname3:31902]#422409467 attempt=3)
[warn 2018/02/06 17:33:37.825 CET <main> tid=0x1] Pool unexpected socket timed out on client connection=Pooled Connection to hostname3:31902: Connection[hostname3:31902]#422409467 attempt=3). Server unreachable: could not connect after 3 attempts
[info 2018/02/06 17:33:37.840 CET <Distributed system shutdown hook> tid=0xd] VM is exiting - shutting down distributed system
[info 2018/02/06 17:33:37.840 CET <Distributed system shutdown hook> tid=0xd] GemFireCache[id = 1839168128; isClosing = true; isShutDownAll = false; created = Tue Feb 06 17:33:05 CET 2018; server = false; copyOnRead = false; lockLease = 120; lockTimeout = 60]: Now closing.
[info 2018/02/06 17:33:37.887 CET <Distributed system shutdown hook> tid=0xd] Destroying connection pool DEFAULT
And it ends up crashing
org.apache.geode.cache.client.ServerConnectivityException: Pool unexpected socket timed out on client connection=Pooled Connection to hostname3:31902: Connection[hostname3:31902]#422409467 attempt=3). Server unreachable: could not connect after 3 attempts
at org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:798)
at org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:623)
at org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:174)
at org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:115)
at org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:763)
at org.apache.geode.cache.client.internal.QueryOp.execute(QueryOp.java:58)
at org.apache.geode.cache.client.internal.ServerProxy.query(ServerProxy.java:70)
at org.apache.geode.cache.query.internal.DefaultQuery.executeOnServer(DefaultQuery.java:456)
at org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:338)
at org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:319)
at local.test.geode.GeodeTest.test_geode_join(GeodeTest.java:226)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:538)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:760)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:460)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:206)
I tried to set timeouts at 60 seconds but i'm still not getting any results.
All regions are configured like this:
Type | Name | Value
------ | --------------- | --------------------
Region | data-policy | PERSISTENT_REPLICATE
| disk-store-name | regionDiskStore1
| size | 1173
| scope | distributed-ack
Am I missing anything here ?

Based on all the information provided, it looks like you are doing everything correct. I tried to reproduce in a simple test (similar test) and the test returns 5 results. However, if one of the predicates did not match, it could cause the query to take a lot longer to join enough rows to find a tuple that matches.
Below is a sample test that does not have an issue, but if I modify the test to put into region3 only portfolios with ID = -1. Then the test "hangs" trying to find 5 rows that fulfill the search criteria (it has to join 1000 * 1000 * 1000 rows which takes awhile to do). In the end the query will not find an p3.ID = p1.ID. Is it possible that the itm.boxIds just do not match box.boxId often enough so it takes a lot longer to find ones that do?
public void testJoinMultipleReplicatePersistentRegionsWithLimitClause() throws Exception {
String regionName = "portfolio";
Cache cache = serverStarterRule.getCache();
assertNotNull(cache);
Region region1 =
cache.createRegionFactory(RegionShortcut.REPLICATE_PERSISTENT).create(regionName + 1);
Region region2 =
cache.createRegionFactory(RegionShortcut.REPLICATE_PERSISTENT).create(regionName + 2);
Region region3 =
cache.createRegionFactory(RegionShortcut.REPLICATE_PERSISTENT).create(regionName + 3);
for ( int i = 0; i < 1000; i++) {
Portfolio p = new Portfolio(i);
region1.put(i, p);
region2.put(i, p);
region3.put(i, p); //modify this line to region3.put(i, new Portfolio(-1)) to cause query to take longer
}
QueryService queryService = cache.getQueryService();
SelectResults results = (SelectResults) queryService
.newQuery("select p1.ID, p2.ID, p3.ID from /portfolio1 p1, /portfolio2 p2, /portfolio3 p3 where p1.ID = p2.ID and p3.ID = p1.ID limit 5").execute();
assertEquals(5, results.size());
}

Related

Array type in clickhouseIO for apache beam(dataflow)

I am using Apache Beam to consume json and insert into clickhouse.
I am currently having a problem with the Array data type.
Everything works fine before I add an array type of field
Schema.Field.of("inputs.value", Schema.FieldType.array(Schema.FieldType.INT64).withNullable(true))
Code for the transformations
p.apply(transformNameSuffix + "ReadFromPubSub",
PubsubIO.readStrings().fromSubscription(chainConfig.getPubSubSubscriptionPrefix() + "transactions").withIdAttribute(PUBSUB_ID_ATTRIBUTE))
.apply(transformNameSuffix + "ReadFromPubSub", ParDo.of(new DoFn<String, Row>() {
#ProcessElement
public void processElement(ProcessContext c) {
String item = c.element();
//System.out.print(item);
Transaction transaction = JsonUtils.parseJson(item, Transaction.class);
c.output(Row.withSchema(Schemas.TRANSACTIONS)
.addValues(*****,
*****
.......
transaction.getInputValues()).build());}
})).setRowSchema(Schemas.TRANSACTIONS).apply(
ClickHouseIO.<Row>write(
chainConfig.getClickhouseJDBCURI(),
chainConfig.getTransactionsTable())
.withMaxRetries(3)
.withMaxInsertBlockSize(1)
.withInitialBackoff(Duration.standardSeconds(5))
.withInsertDeduplicate(true)
.withInsertDistributedSync(false));
The method that generates the inputs
public List<Long> getInputValues() {
List<Long> values = Lists.newArrayList();
for (TransactionInput eachInput : inputs) {
System.out.print(eachInput.getValue());
values.add(eachInput.getValue());
}
return values;
}
The error I am getting is :
ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 33, host: 35.202.46.77, port: 8123; Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 6. Bytes expected: 15. (version 19.17.4.11 (official build))
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:58)
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:28)
at ru.yandex.clickhouse.ClickHouseStatementImpl.checkForErrorAndThrow(ClickHouseStatementImpl.java:875)
at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:851)
at ru.yandex.clickhouse.Writer.send(Writer.java:106)
at ru.yandex.clickhouse.Writer.send(Writer.java:141)
at ru.yandex.clickhouse.ClickHouseStatementImpl.sendRowBinaryStream(ClickHouseStatementImpl.java:764)
at ru.yandex.clickhouse.ClickHouseStatementImpl.sendRowBinaryStream(ClickHouseStatementImpl.java:758)
at org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn.flush(ClickHouseIO.java:427)
at org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn.processElement(ClickHouseIO.java:411)
at org.apache.beam.sdk.io.clickhouse.AutoValue_ClickHouseIO_WriteFn$DoFnInvoker.invokeProcessElement(Unknown Source)
at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:222)
at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:183)
at org.apache.beam.repackaged.direct_java.runners.core.SimplePushbackSideInputDoFnRunner.processElementInReadyWindows(SimplePushbackSideInputDoFnRunner.java:78)
at org.apache.beam.runners.direct.ParDoEvaluator.processElement(ParDoEvaluator.java:216)
at org.apache.beam.runners.direct.DoFnLifecycleManagerRemovingTransformEvaluator.processElement(DoFnLifecycleManagerRemovingTransformEvaluator.java:54)
at org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:160)
at org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:124)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Throwable: Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 6. Bytes expected: 15. (version 19.17.4.11 (official build))
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:53)
... 22 more
Feb 06, 2020 9:04:38 PM org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn flush
WARNING: Error writing to ClickHouse. Retry attempt[1]
ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 33, host: 35.202.46.77, port: 8123; Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 6. Bytes expected: 93. (version 19.17.4.11 (official build))
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:58)
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:28)
at ru.yandex.clickhouse.ClickHouseStatementImpl.checkForErrorAndThrow(ClickHouseStatementImpl.java:875)
at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:851)
at ru.yandex.clickhouse.Writer.send(Writer.java:106)
at ru.yandex.clickhouse.Writer.send(Writer.java:141)
at ru.yandex.clickhouse.ClickHouseStatementImpl.sendRowBinaryStream(ClickHouseStatementImpl.java:764)
at ru.yandex.clickhouse.ClickHouseStatementImpl.sendRowBinaryStream(ClickHouseStatementImpl.java:758)
at org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn.flush(ClickHouseIO.java:427)
at org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn.processElement(ClickHouseIO.java:411)
at org.apache.beam.sdk.io.clickhouse.AutoValue_ClickHouseIO_WriteFn$DoFnInvoker.invokeProcessElement(Unknown Source)
at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:222)
at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:183)
at org.apache.beam.repackaged.direct_java.runners.core.SimplePushbackSideInputDoFnRunner.processElementInReadyWindows(SimplePushbackSideInputDoFnRunner.java:78)
at org.apache.beam.runners.direct.ParDoEvaluator.processElement(ParDoEvaluator.java:216)
at org.apache.beam.runners.direct.DoFnLifecycleManagerRemovingTransformEvaluator.processElement(DoFnLifecycleManagerRemovingTransformEvaluator.java:54)
at org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:160)
at org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:124)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Throwable: Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 6. Bytes expected: 93. (version 19.17.4.11 (official build))
at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:53)
... 22 more
Feb 06, 2020 9:04:39 PM org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn flush
WARNING: Error writing to ClickHouse. Retry attempt[1]
ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 33, host: 35.202.46.77, port: 8123; Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 5. Bytes expected: 2641. (version 19.17.4.11 (official build)
Clikhouse schema:
CREATE TABLE IF NOT EXISTS transactions_streaming_small (
*****,
*****,
inputs Nested ( value Nullable(UInt64) ) )
ENGINE = MergeTree() PARTITION BY toYYYYMM(block_date_time)
What is the problem?
[ClickHouse version 19.17.4.11 (official build)]

LogesticRegression fit() function is throwing this error

i'm following datacamp pyspark tutorial series and on chapter 04 Model tuning and selection in fitting the model, I'm getting this error when i execute these line
best_lr = lr.fit(training)
Error
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<ipython-input-102-88042cb88c20> in <module>()
----> 1 best_lr = lr.fit(training)
/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/ml/base.py in fit(self, dataset, params)
130 return self.copy(params)._fit(dataset)
131 else:
--> 132 return self._fit(dataset)
133 else:
134 raise ValueError("Params must be either a param map or a list/tuple of param maps, "
/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/ml/wrapper.py in _fit(self, dataset)
286
287 def _fit(self, dataset):
--> 288 java_model = self._fit_java(dataset)
289 model = self._create_model(java_model)
290 return self._copyValues(model)
/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/ml/wrapper.py in _fit_java(self, dataset)
283 """
284 self._transfer_params_to_java()
--> 285 return self._java_obj.fit(dataset._jdf)
286
287 def _fit(self, dataset):
/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py in __call__(self, *args)
1158 answer = self.gateway_client.send_command(command)
1159 return_value = get_return_value(
-> 1160 answer, self.gateway_client, self.target_id, self.name)
1161
1162 for temp_arg in temp_args:
/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py in deco(*a, **kw)
61 def deco(*a, **kw):
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
65 s = e.java_exception.toString()
/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
318 raise Py4JJavaError(
319 "An error occurred while calling {0}{1}{2}.\n".
--> 320 format(target_id, ".", name), value)
321 else:
322 raise Py4JError(
Py4JJavaError: An error occurred while calling o596.fit.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 60.0 failed 1 times, most recent failure: Lost task 2.0 in stage 60.0 (TID 86, localhost, executor driver): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$3: (struct<month_double_VectorAssembler_42f79ae7f99735f04859:double,air_time_double_VectorAssembler_42f79ae7f99735f04859:double,carrier_fact:vector,dest_fact:vector,plane_age_double_VectorAssembler_42f79ae7f99735f04859:double>) => vector)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.sort_addToSorter$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:216)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1092)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Values to assemble cannot be null.
at org.apache.spark.ml.feature.VectorAssembler$$anonfun$assemble$1.apply(VectorAssembler.scala:163)
at org.apache.spark.ml.feature.VectorAssembler$$anonfun$assemble$1.apply(VectorAssembler.scala:146)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at org.apache.spark.ml.feature.VectorAssembler$.assemble(VectorAssembler.scala:146)
at org.apache.spark.ml.feature.VectorAssembler$$anonfun$3.apply(VectorAssembler.scala:99)
at org.apache.spark.ml.feature.VectorAssembler$$anonfun$3.apply(VectorAssembler.scala:98)
... 24 more
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2131)
at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1092)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.fold(RDD.scala:1086)
at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply(RDD.scala:1155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1131)
at org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:518)
at org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:488)
at org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:278)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Failed to execute user defined function($anonfun$3: (struct<month_double_VectorAssembler_42f79ae7f99735f04859:double,air_time_double_VectorAssembler_42f79ae7f99735f04859:double,carrier_fact:vector,dest_fact:vector,plane_age_double_VectorAssembler_42f79ae7f99735f04859:double>) => vector)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.sort_addToSorter$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:216)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1092)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
Caused by: org.apache.spark.SparkException: Values to assemble cannot be null.
at org.apache.spark.ml.feature.VectorAssembler$$anonfun$assemble$1.apply(VectorAssembler.scala:163)
at org.apache.spark.ml.feature.VectorAssembler$$anonfun$assemble$1.apply(VectorAssembler.scala:146)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at org.apache.spark.ml.feature.VectorAssembler$.assemble(VectorAssembler.scala:146)
at org.apache.spark.ml.feature.VectorAssembler$$anonfun$3.apply(VectorAssembler.scala:99)
at org.apache.spark.ml.feature.VectorAssembler$$anonfun$3.apply(VectorAssembler.scala:98)
... 24 more
Tools
I'm using online pyspark cluter with Cloudxlabs.com(trail version)
May be there are some NULL values in the data set. You'll have to take care of those first.
As explained by the error "Values to assemble cannot be null."
Apart from removing or imputing missing values, they can be replaced with mean, median values.
Second option is using xgboost for regression which will automatically handle missing values.
df = pd.DataFrame({'Last_Name': ['Smith', None, 'Brown'],
'First_Name': ['John', 'Mike', 'Bill'],
'Age': [35, 45, None]})
print(df)
Last_Name First_Name Age
0 Smith John 35.0
1 None Mike 45.0
2 Brown Bill NaN
df2 = df.dropna()
print(df2)
Last_Name First_Name Age
0 Smith John 35.0
Also xgboost can be applied as below:
https://www.datacamp.com/community/tutorials/xgboost-in-python

esp8266 RTOS blink example doesn't work

I have a problem with the RTOS firmware on the esp8266 (I have a esp12e), after flashing the firmware, reading from uart, it keeps stuck with those lines:
ets Jan 8 2013,rst cause:2, boot mode:(3,0)
load 0x40100000, len 31584, room 16
tail 0
chksum 0x24
load 0x3ffe8000, len 944, room 8
tail 8
chksum 0x9e
load 0x3ffe83b0, len 1080, room 0
tail 8
chksum 0x60
csum 0x60
Now I will explain my HW setup:
GPIO15 -> Gnd
EN -> Vcc
GPIO0 -> Gnd (when flashing)
GPIO0 -> Vcc (normal mode)
For the toolchain I've followed this tutorial and it works well:
http://microcontrollerkits.blogspot.it/2015/12/esp8266-eclipse-development.html
Then I started doing my RTOS blink example, I post my user_main.c code here:
#include "esp_common.h"
#include "gpio.h"
void task2(void *pvParameters)
{
printf("Hello, welcome to client!\r\n");
while(1)
{
// Delay and turn on
vTaskDelay (300/portTICK_RATE_MS);
GPIO_OUTPUT_SET (5, 1);
// Delay and LED off
vTaskDelay (300/portTICK_RATE_MS);
GPIO_OUTPUT_SET (5, 0);
}
}
/******************************************************************************
* FunctionName : user_rf_cal_sector_set
* Description : SDK just reversed 4 sectors, used for rf init data and paramters.
* We add this function to force users to set rf cal sector, since
* we don't know which sector is free in user's application.
* sector map for last several sectors : ABCCC
* A : rf cal
* B : rf init data
* C : sdk parameters
* Parameters : none
* Returns : rf cal sector
*******************************************************************************/
uint32 user_rf_cal_sector_set(void)
{
flash_size_map size_map = system_get_flash_size_map();
uint32 rf_cal_sec = 0;
switch (size_map) {
case FLASH_SIZE_4M_MAP_256_256:
rf_cal_sec = 128 - 5;
break;
case FLASH_SIZE_8M_MAP_512_512:
rf_cal_sec = 256 - 5;
break;
case FLASH_SIZE_16M_MAP_512_512:
case FLASH_SIZE_16M_MAP_1024_1024:
rf_cal_sec = 512 - 5;
break;
case FLASH_SIZE_32M_MAP_512_512:
case FLASH_SIZE_32M_MAP_1024_1024:
rf_cal_sec = 1024 - 5;
break;
default:
rf_cal_sec = 0;
break;
}
return rf_cal_sec;
}
/******************************************************************************
* FunctionName : user_init
* Description : entry of user application, init user function here
* Parameters : none
* Returns : none
*******************************************************************************/
void user_init(void)
{
uart_init_new();
printf("SDK version:%s\n", system_get_sdk_version());
// Config pin as GPIO5
PIN_FUNC_SELECT (PERIPHS_IO_MUX_GPIO5_U, FUNC_GPIO5);
xTaskCreate(task2, "tsk2", 256, NULL, 2, NULL);
}
I also post the flash command, the first executed one time, the second every time I modify the code:
c:/Espressif/utils/ESP8266/esptool.exe -p COM3 write_flash -ff 40m -fm qio -fs 32m 0x3FC000 c:/Espressif/ESP8266_RTOS_SDK/bin/esp_init_data_default.bin 0x3FE000 c:/Espressif/ESP8266_RTOS_SDK/bin/blank.bin 0x7E000 c:/Espressif/ESP8266_RTOS_SDK/bin/blank.bin
c:/Espressif/utils/ESP8266/esptool.exe -p COM3 -b 256000 write_flash -ff 40m -fm qio -fs 32m 0x00000 firmware/eagle.flash.bin 0x40000 firmware/eagle.irom0text.bin
There is something wrong? I really don't understand why it doesn't work.
When I try the NON-OS example they works very well.
I had the same problem as you. This issue is caused by the incorrect address of the eagle.irom0text.bin .
So I changed the address of the eagle.irom0text.bin from 0x40000 (0x10000) to 0x20000 and it worked well for me.
[RTOS SDK version: 1.4.2(f57d61a)]
The correct flash codes in the common_rtos.mk (ESP-12E)
for flashinit
flashinit:
$(vecho) "Flash init data default and blank data."
$(ESPTOOL) -p $(ESPPORT) write_flash $(flashimageoptions) 0x3fc000 $(SDK_BASE)/bin/esp_init_data_default.bin
$(ESPTOOL) -p $(ESPPORT) write_flash $(flashimageoptions) 0x3fe000 $(SDK_BASE)/bin/blank.bin
for flash:
flash: all
#ifeq ($(app), 0)
$(ESPTOOL) -p $(ESPPORT) -b $(ESPBAUD) write_flash $(flashimageoptions) 0x00000 $(FW_BASE)/eagle.flash.bin 0x20000 $(FW_BASE)/eagle.irom0text.bin
else
ifeq ($(boot), none)
$(ESPTOOL) -p $(ESPPORT) -b $(ESPBAUD) write_flash $(flashimageoptions) 0x00000 $(FW_BASE)/eagle.flash.bin 0x20000 $(FW_BASE)/eagle.irom0text.bin
else
$(ESPTOOL) -p $(ESPPORT) -b $(ESPBAUD) write_flash $(flashimageoptions) $(addr) $(FW_BASE)/upgrade/$(BIN_NAME).bin
endif
endif

mprotect errno 22 iOS

I'm developing a jailbroken app on iOS and getting errno 22 when calling
mprotect(p, 1024, PROT_READ | PROT_EXEC)
errno 22 means invalid arguments but I can't figure out whats wrong. I've aligned p to be a multiple of page size, and I've malloced the memory previously before calling mprotect.
Here's my code and sample output
#define PAGESIZE 4096
FILE * pFile;
pFile = fopen ("log.txt","w");
uint32_t code[] = {
0xe2800001, // add r0, r0, #1
0xe12fff1e, // bx lr
};
fprintf(pFile, "Before Execution\n");
p = (uint32_t *)malloc(1024+PAGESIZE-1);
if (!p) {
fprintf(pFile, "Couldn't malloc(1024)");
perror("Couldn't malloc(1024)");
exit(errno);
}
fprintf(pFile, "Malloced to %p\n", p);
p = (uint32_t *)(((uintptr_t)p + PAGESIZE-1) & ~(PAGESIZE-1));
fprintf(pFile, "Moved pointer to %p\n", p);
fprintf(pFile, "Before Compiling\n");
// copy instructions to function
p[0] = code[0];
p[1] = code[1];
fprintf(pFile, "After Compiling\n");
if (mprotect(p, 1024, PROT_READ | PROT_EXEC)) {
int err = errno;
fprintf(pFile, "Couldn't mprotect2: %i\n", errno);
perror("Couldn't mprotect");
exit(errno);
}
And output:
Before Execution
Malloced to 0x13611ec00
Moved pointer 0x13611f000
Before Compiling
After Compiling
Couldn't mprotect2: 22
Fixed this by using posix_memalign(). Turns out I wasn't aligning my pointer to the page size correctly

IMAPMessage.getRecipients() and IMAPMessage.getAllRecipients() return null

I'm writing an IMAP message poller (to be used from within a business app). I'm able to connect, iterate through the messages in Inbox, read their headers and content but calls to getAllRecipients() and getRecipients(Message.RecipientType.TO) always return null.
Message messages[] = inbox.getMessages();
for (Message message : messages) {
IMAPMessage imapMessage = (IMAPMessage) message;
Address[] toRecipients = imapMessage.getRecipients(Message.RecipientType.TO);
Address[] allRecipients = imapMessage.getAllRecipients();
This is puzzling. The messages in the Inbox have been sent with regular mail clients so there is nothing unusual with them.
The IMAP server is running Dovecot.
* OK Dovecot ready.
A0 CAPABILITY
* CAPABILITY IMAP4rev1 SASL-IR SORT THREAD=REFERENCES MULTIAPPEND UNSELECT LITERAL+ IDLE CHILDREN NAMESPACE LOGIN-REFERRALS STARTTLS AUTH=PLAIN
A0 OK Capability completed.
This is the relevant traffic dump captured with Wireshark while doing the above (and also calling imapMessage.getContent()).
A3 SELECT Inbox
* FLAGS (\Answered \Flagged \Deleted \Seen \Draft)
* OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft \*)] Flags permitted.
* 2 EXISTS
* 0 RECENT
* OK [UIDVALIDITY 1277135188] UIDs valid
* OK [UIDNEXT 3] Predicted next UID
A3 OK [READ-WRITE] Select completed.
A4 FETCH 1 (BODYSTRUCTURE)
* 1 FETCH (BODYSTRUCTURE ("text" "plain" ("charset" "us-ascii") NIL NIL "7bit" 12 1 NIL NIL NIL))
A4 OK Fetch completed.
A5 FETCH 1 (BODY[TEXT]<0.12>)
* 1 FETCH (BODY[TEXT]<0> {12}
here it is
)
A5 OK Fetch completed.
A6 FETCH 1 (FLAGS)
* 1 FETCH (FLAGS (\Seen))
A6 OK Fetch completed.
A7 FETCH 1 (BODY.PEEK[HEADER])
* 1 FETCH (BODY[HEADER] {399}
Return-Path: <EDITED>
Received: from EDITED; Sat, 5 Jun 2010 15:33:13 -0400
Date: Sat, 5 Jun 2010 15:32:40 -0400
From: EDITED
Message-Id: <EDITED>
Subject: Test Message
Lines: 1
)
A7 OK Fetch completed.
A8 FETCH 1 (ENVELOPE INTERNALDATE RFC822.SIZE)
* 1 FETCH (INTERNALDATE "05-Jun-2010 15:33:32 -0400" RFC822.SIZE 411 ENVELOPE ("Sat, 5 Jun 2010 15:32:40 -0400" "Test Message" ((NIL NIL "myediteduser" "myediteddomain")) ((NIL NIL "myediteduser" "myediteddomain")) ((NIL NIL "myediteduser" "myediteddomain")) NIL NIL NIL NIL "<EDITED>"))
A8 OK Fetch completed.
A9 FETCH 2 (BODYSTRUCTURE)
* 2 FETCH (BODYSTRUCTURE (("text" "plain" ("charset" "iso-8859-1") NIL NIL "quoted-printable" 8 0 NIL NIL NIL)("text" "html" ("charset" "iso-8859-1") NIL NIL "quoted-printable" 341 9 NIL NIL NIL) "alternative" ("boundary" "----=_NextPart_000_0003_01CB1137.CCF78C80") NIL NIL))
A9 OK Fetch completed.
Any hints are appreciated. I don't know if there is anything else I should be calling or if there is a setting in the IMAP server. I've looked at all methods of the IMAPMessage in case there was something to run before calling getRecipients() and getAllRecipients() but there was nothing. Also googled for a while and found nothing else I should have called.
Just to close it: this was an issue with the mail server setup and it's now fixed.

Resources