Vertex failure while joining 2 big tables in hive - join
I have 2 tables in hive. Table A has 300M rows and Table B has 26M rows.
I am joining Table A and Table B on 3 columns col1,col2,col3.
Below is the query I am using
create temporary table AB_TEMP AS
select A.col1,A.col2,A.col3,A.col4,A.col5
from A
join B
on A.col1=B.col1 and A.col2=B.col2 and A.col3=B.col3;
I am getting an error called vertex failure every time I run this query.
What to do to overcome this issue?
Below is the error that I am getting
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1617665530644_1398582_10_01, diagnostics=[Task failed, taskId=task_1617665530644_1398582_10_01_000147, diagnostics=[TaskAttempt 0 failed, info=[AttemptID:attempt_1617665530644_1398582_10_01_000147_0 Timed out after 300 secs], TaskAttempt 1 failed, info=[AttemptID:attempt_1617665530644_1398582_10_01_000147_1 Timed out after 300 secs], TaskAttempt 2 failed, info=[AttemptID:attempt_1617665530644_1398582_10_01_000147_2 Timed out after 300 secs], TaskAttempt 3 failed, info=[Container container_e42_1617665530644_1398582_01_002060 timed out]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:220, Vertex vertex_1617665530644_1398582_10_01 [Map 1] killed/failed due to:OWN_TASK_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1617665530644_1398582_10_01, diagnostics=[Task failed, taskId=task_1617665530644_1398582_10_01_000147, diagnostics=[TaskAttempt 0 failed, info=[AttemptID:attempt_1617665530644_1398582_10_01_000147_0 Timed out after 300 secs], TaskAttempt 1 failed, info=[AttemptID:attempt_1617665530644_1398582_10_01_000147_1 Timed out after 300 secs], TaskAttempt 2 failed, info=[AttemptID:attempt_1617665530644_1398582_10_01_000147_2 Timed out after 300 secs], TaskAttempt 3 failed, info=[Container container_e42_1617665530644_1398582_01_002060 timed out]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:220, Vertex vertex_1617665530644_1398582_10_01 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
Don't execute this query on tez. We can complete this in Map Reduce.
set hive.execution.engine=mr;
set hive.auto.convert.join=false;
set mapreduce.map.memory.mb=2048;
set mapreduce.reduce.memory.mb=4096;
After setting all the above mentioned parameters, you can run the code and it executes fine
Memory settings for Tez (figures are just an example):
--For AM
set tez.am.resource.memory.mb=8192;
set tez.am.java.opts=-Xmx6144m;
--Mapper and Reducer
set tez.reduce.memory.mb=6144;
set hive.tez.container.size=9216;
set hive.tez.java.opts=-Xmx6144m;
Related
"Ignored or could not reach conclusion" in Dafny
I was confused about whether some added lemmas can help Dafny verify a desired property, mainly because the hint "ignored or could not reach conclusion" was ambiguous. For example, in the following proof: forall x | x in s ensures x.1.Valid() { // assertion 1 // assertion 2 // assertion 3 // assertion 4 // assertion 5 } Dafny reports errors on assertions 1-4, but for assertion 5 Dafny says "ignored or could not reach conclusion". When I commented out assertions 3, now Dafny reports error on assertion 5 as well. However, assertion 3 is absolutely not related to assertion 5, because assertion 5 has something to do with heap objects, while assertion 3 only has to do with a sequence of type seq(<int, int>). I find it difficult to add lemmas if I have no idea whether an added lemma will successfully help prove the subsequent assertions. Why does Dafny choose to ignore assertion 5 when assertion 3 is not commented out? Can I force Dafny not to "ignore" some assertion?
Dataflow - Approx Unique on unbounded source
I'm getting unexpected results streaming in the cloud. My pipeline looks like: SlidingWindow(60min).every(1min) .triggering(Repeatedly.forever( AfterWatermark.pastEndOfWindow() .withEarlyFirings(AfterProcessingTime .pastFirstElementInPane() .plusDelayOf(Duration.standardSeconds(30))) ) ) .withAllowedLateness(15sec) .accumulatingFiredPanes() .apply("Get UniqueCounts", ApproximateUnique.perKey(.05)) .apply("Window hack filter", ParDo( if(window.maxTimestamp.isBeforeNow()) c.output(element) ) ) .toJSON() .toPubSub() If that filter isn't there, I get 60 windows per output. Apparently because the pubsub sink isn't window aware. So in the examples below, if each time period is a minute, I'd expect to see the unique count grow until 60 minutes when the sliding window closes. Using DirectRunner, I get expected results: t1: 5 t2: 10 t3: 15 ... tx: growing unique count In dataflow, I get weird results: t1: 5 t2: 10 t3: 0 t4: 0 t5: 2 t6: 0 ... tx: wrong unique count However, if my unbounded source has older data, I'll get normal looking results until it catches up at which point I'll get the wrong results. I was thinking it had to do with my window filter, but removing that didn't change the results. If I do a Distinct() then Count().perKey(), it works, but that slows my pipeline considerably. What am I overlooking?
[Update from the comments] ApproximateUnique inadvertently resets its accumulated value when result is extracted. This is incorrect when the value is read more than once as with windows firing multiple times. Fix (will be in version 2.4): https://github.com/apache/beam/pull/4688
Do the assertions in the luassert library have a `level` parameter similar to the builtin `error` function?
I am currently writing a test suite using busted/luassert and since I have put some assertions in a separate function I am getting inaccurate stack traces. For example, consider the following test suite (a_spec.lua): local function my_custom_assertion(x) -- 1 assert.is_true(x > 0) -- 2 <- end -- 3 -- 4 describe("My test suite", function() -- 5 it("they are positive", function() -- 6 my_custom_assertion(-10) -- 7 <- my_custom_assertion(-20) -- 8 <- end) -- 9 end) -- 10 When I run it, my test case fails but the stack trace points to line 2 so I can't tell which of the two assertions was the one that failed. $busted spec/a_spec.lua ◼ 0 successes / 1 failure / 0 errors / 0 pending : 0.002242 seconds Failure → spec/a_spec.lua # 6 My test suite they are positive spec/a_spec.lua:2: Expected objects to be the same. Passed in: (boolean) false Expected: (boolean) true Is there a way I could have it point to line 7 or 8 instead? One way this would be possible is if luassert's assert.is_true function had a level parameter similar to the builtin error function. Looking at luassert's source code it seems that it does care about the stack level but I haven't been able to figure out if this feature is internal or if it is exposed to the user somehow.
Instead of creating a custom assertion by creating a function that calls assert.xyzz(), create a function that returns true or false and register it with assert:register. See the second example in the README.
Turns out that there is a way to solve my actual problem of finding out which of the assertions was the one that fired without needing to change how I write my tests. By invoking busted with the -v (--verbose) option it prints a full stack trace when an assertion fails instead of just providing a single line number. $ busted -v spec/a_spec.lua 0 successes / 1 failure / 0 errors / 0 pending : 0.003241 seconds Failure → spec/a_spec.lua # 6 My test suite they are positive spec/a_spec.lua:2: Expected objects to be the same. Passed in: (boolean) false Expected: (boolean) true stack traceback: spec/a_spec.lua:2: in upvalue 'my_custom_assertion' spec/a_spec.lua:7: in function <spec/a_spec.lua:6> That mention of line number 7 lets me know what assertion was the one that failed.
Neo4j batch importer NotFoundException
I'm consistently running into a NotFoundException when using the batch importer to read large nodes and relationship files. I've used the importer successfully before with an even larger dataset, but I've rewritten the way I generate the two files, and I'm trying to figure out why it now throws an error. The problem It seems to read the nodes file and then throws an error near the start of the rels file, stating that it cannot find a node. I believe this is because it hasn't really imported all the nodes. It reports importing only half of the nodes in nodes.tsv (2.1m of 4.6m total). Things I've checked: The node numbers in nodes.tsv are sequential and continuous (0 to ~4.5m) The node that throws the exception appears in both files (including as both source and target in rels.tsv) I can successfully import a smaller subset of my data (~80k nodes) using the same tsv generator script Even though the relationships are not sorted on target (only on source), the smaller subset does not throw this exception The insert command: ./import.sh wiki.db nodes.tsv rels.tsv Error message Using Existing Configuration File ..................... Importing 2129648 Nodes took 6400 seconds Total import time: 6404 seconds Exception in thread "main" org.neo4j.graphdb.NotFoundException: id=3608148 at org.neo4j.unsafe.batchinsert.BatchInserterImpl.getNodeRecord(BatchInserterImpl .java:1215) at org.neo4j.unsafe.batchinsert.BatchInserterImpl.createRelationship(BatchInserte rImpl.java:777) at org.neo4j.batchimport.Importer.importRelationships(Importer.java:154) at org.neo4j.batchimport.Importer.doImport(Importer.java:232) at org.neo4j.batchimport.Importer.main(Importer.java:83) The files nodes.tsv (4578730 lines) node name l:label degrees 0 Stroud_railway_station Page 21 1 ATP–ADP_translocase Page 38 2 Pedro_Hernández_Martínez Page 12 3 Christopher_Lowther Page 4 4 Cloncurry_River Page 10 5 Neil_Kinnock Page 147 6 Free_agent_(business) Page 10 7 Christian_Hilt Page 27 8 2009_Riviera_di_Rimini_Challenger Page 27 rels.tsv (113322480 lines) start end type 0 3608148 LINKS_TO 0 870126 LINKS_TO 0 1516248 LINKS_TO 0 3493391 LINKS_TO 0 3034096 LINKS_TO 0 1421544 LINKS_TO 0 2808745 LINKS_TO 0 1872783 LINKS_TO 0 1673612 LINKS_TO
Hmm seems to be a problem with your CSV file, did you try to run CSVKit or similar on it? Perhaps you can narrow down the issue by bisecting the nodes.csv and finding the offending line? Also try to use the opencsv parser by enabling quotes in your batch.properties https://github.com/jexp/batch-import/tree/20#csv-experimental batch_import.csv.quotes=true Or flip it to false. Perhaps you have stray single our double quotes in your text? If so then please quote it.
websocket client packet unframe/unmask
I am trying to implement latest websocket spec. However, i am unable to get through the unmasking step post successful handshake. I receive following web socket frame: <<129,254,1,120,37,93,40,60,25,63,71,88,92,125,80,81,73, 51,91,1,2,53,92,72,85,103,7,19,79,60,74,94,64,47,6,83, 87,58,7,76,87,50,92,83,70,50,68,19,77,41,92,76,71,52, 70,88,2,125,90,85,65,96,15,14,20,107,31,14,28,100,27,9, 17,122,8,72,74,96,15,86,68,37,68,18,76,48,15,28,93,48, 68,6,73,60,70,91,24,122,77,82,2,125,80,81,85,45,18,74, 64,47,91,85,74,51,21,27,20,115,24,27,5,37,69,80,75,46, 18,68,72,45,88,1,2,40,90,82,31,37,69,76,85,103,80,94, 74,46,64,27,5,60,75,87,24,122,25,27,5,47,71,73,81,56, 21,27,93,48,88,76,31,57,77,74,11,55,73,68,73,115,65,81, 31,104,26,14,23,122,8,75,68,52,92,1,2,110,24,27,5,53, 71,80,65,96,15,13,2,125,75,83,75,41,77,82,81,96,15,72, 64,37,92,19,93,48,68,7,5,62,64,93,87,46,77,72,24,40,92, 90,8,101,15,28,83,56,90,1,2,108,6,13,21,122,8,82,64,42, 67,89,92,96,15,93,19,56,28,8,65,101,31,94,16,105,28,10, 20,56,30,14,65,56,27,93,71,106,16,11,17,63,25,4,17,57, 73,89,17,59,29,88,29,106,24,27,5,46,65,72,64,54,77,69, 24,122,66,93,93,49,5,12,8,109,15,28,76,59,90,93,72,56, 76,1,2,41,90,73,64,122,8,89,85,50,75,84,24,122,25,15, 23,105,25,5,19,106,26,14,20,111,25,27,5,53,77,85,66,53, 92,1,2,110,26,13,2,125,95,85,65,41,64,1,2,108,27,10,19, 122,7,2>> As per base framing protocol defined here (https://datatracker.ietf.org/doc/html/draft-ietf-hybi-thewebsocketprotocol-17#section-5.2) i have: fin:1, rsv:0, opcode:1, mask:1, length:126 Masked application+payload data comes out to be: <<87,58,7,76,87,50,92,83,70,50,68,19,77,41,92,76,71,52,70,88,2,125,90,85,65,96, 15,14,20,107,31,14,28,100,27,9,17,122,8,72,74,96,15,86,68,37,68,18,76,48,15, 28,93,48,68,6,73,60,70,91,24,122,77,82,2,125,80,81,85,45,18,74,64,47,91,85, 74,51,21,27,20,115,24,27,5,37,69,80,75,46,18,68,72,45,88,1,2,40,90,82,31,37, 69,76,85,103,80,94,74,46,64,27,5,60,75,87,24,122,25,27,5,47,71,73,81,56,21, 27,93,48,88,76,31,57,77,74,11,55,73,68,73,115,65,81,31,104,26,14,23,122,8,75, 68,52,92,1,2,110,24,27,5,53,71,80,65,96,15,13,2,125,75,83,75,41,77,82,81,96, 15,72,64,37,92,19,93,48,68,7,5,62,64,93,87,46,77,72,24,40,92,90,8,101,15,28, 83,56,90,1,2,108,6,13,21,122,8,82,64,42,67,89,92,96,15,93,19,56,28,8,65,101, 31,94,16,105,28,10,20,56,30,14,65,56,27,93,71,106,16,11,17,63,25,4,17,57,73, 89,17,59,29,88,29,106,24,27,5,46,65,72,64,54,77,69,24,122,66,93,93,49,5,12,8, 109,15,28,76,59,90,93,72,56,76,1,2,41,90,73,64,122,8,89,85,50,75,84,24,122, 25,15,23,105,25,5,19,106,26,14,20,111,25,27,5,53,77,85,66,53,92,1,2,110,26, 13,2,125,95,85,65,41,64,1,2,108,27,10,19,122,7,2>> While the 32-bit masking key is: <<37,93,40,60,25,63,71,88,92,125,80,81,73,51,91,1,2,53,92,72,85,103,7,19,79,60, 74,94,64,47,6,83>> As per https://datatracker.ietf.org/doc/html/draft-ietf-hybi-thewebsocketprotocol-17#section-5.2 : j = i MOD 4 transformed-octet-i = original-octet-i XOR masking-key-octet-j however, i doesn't seem to get my original octet sent from client side, which is basically a xml packet. Any direction, correction, suggestions are greatly appreciated.
I think you've mis-read the data framing section of the protocol spec. Your interpretation of the first byte (129) is correct - fin + opcode 1 - final (and first) fragment of a text message. The next byte (254) implies that the body of the message is masked and that the following 2 bytes provide its length (lengths of 126 or 127 imply longer messages whose length's can't be represented in 7 bits. 126 means that the following 2 bytes hold the length; 127 mean that its the following 4 bytes). The following 2 bytes - 1, 120 - imply a message length of 376 bytes. The following 4 bytes - 37,93,40,60 - are your mask. The remaining data is your message which should be transformed as you write, giving the message <body xmlns='http://jabber.org/protocol/httpbind' rid='2167299354' to='jaxl.im' xml:lang='en' xmpp:version='1.0' xmlns:xmpp='urn:xmpp:xbosh' ack='1' route='xmpp:dev.jaxl.im:5222' wait='30' hold='1' content='text/xml; charset=utf-8' ver='1.1 0' newkey='a6e44d87b54461e62de3ab7874b184dae4f5d870' sitekey='jaxl-0-0' iframed='true' epoch='1324196722121' height='321' width='1366'/>