Summarize Features on a product - google-sheets

I would like to summarize and combine all product features on a certain product.
With the following formula we are able to have it summarized. But the category and feature need to be splitted.
See desired outcome tab in Example file.
Formula used:
=ARRAYFORMULA(proper(REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(REGEXREPLACE({proper(ArrayFormula)(
transpose(split(rept(concatenate(Kenmerken!F3&char(9));counta(Kenmerken!F76:F150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!F76:F150&char(9);counta(Kenmerken!F3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!H3&char(9));counta(Kenmerken!H76:H150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!H76:H150&char(9);counta(Kenmerken!H3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!G3&char(9));counta(Kenmerken!G76:G150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!G76:G150&char(9);counta(Kenmerken!G3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!I3&char(9));counta(Kenmerken!I76:I150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!I76:I150&char(9);counta(Kenmerken!I3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!J3&char(9));counta(Kenmerken!J76:J150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!J76:J150&char(9);counta(Kenmerken!J3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!K3&char(9));counta(Kenmerken!K76:K150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!K76:K150&char(9);counta(Kenmerken!K3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!L3&char(9));counta(Kenmerken!L76:L150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!L76:L150&char(9);counta(Kenmerken!L3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!M3&char(9));counta(Kenmerken!M76:M150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!M76:M150&char(9);counta(Kenmerken!M3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!N3&char(9));counta(Kenmerken!N76:N150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!N76:N150&char(9);counta(Kenmerken!N3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!O$3&char(9));counta(Kenmerken!O$76:O$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!O$76:O$150&char(9);counta(Kenmerken!O$3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!P$3&char(9));counta(Kenmerken!P$76:P$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!P$76:P$150&char(9);counta(Kenmerken!P$3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!Q$3&char(9));counta(Kenmerken!Q$76:Q$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!Q$76:Q$150&char(9);counta(Kenmerken!Q$3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!R$3&char(9));counta(Kenmerken!R$76:R$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!R$76:R$150&char(9);counta(Kenmerken!R$3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!S$3&char(9));counta(Kenmerken!S$76:S$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!S$76:S$150&char(9);counta(Kenmerken!S$3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!T$3&char(9));counta(Kenmerken!T$76:T$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!T$76:T$150&char(9);counta(Kenmerken!T$3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!U$3&char(9));counta(Kenmerken!U$76:U$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!U$76:U$150&char(9);counta(Kenmerken!U$3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!V$3&char(9));counta(Kenmerken!V$76:V$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!V$76:V$150&char(9);counta(Kenmerken!V$3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!W$3&char(9));counta(Kenmerken!W$76:W$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!W$76:W$150&char(9);counta(Kenmerken!W$3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!X$3&char(9));counta(Kenmerken!X$76:X$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!X$76:X$150&char(9);counta(Kenmerken!X$3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!Y$3&char(9));counta(Kenmerken!Y$76:Y$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!Y$76:Y$150&char(9);counta(Kenmerken!Y$3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!Z$3&char(9));counta(Kenmerken!Z$76:Z$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!Z$76:Z$150&char(9);counta(Kenmerken!Z$3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!AA$3&char(9));counta(Kenmerken!AA$76:AA$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!AA$76:AA$150&char(9);counta(Kenmerken!AA$3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!AB$3&char(9));counta(Kenmerken!AB$76:AB$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!AB$76:AB$150&char(9);counta(Kenmerken!AB$3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!AC$3&char(9));counta(Kenmerken!AC$76:AC$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!AC$76:AC$150&char(9);counta(Kenmerken!AC$3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!AD$3&char(9));counta(Kenmerken!AD$76:AD$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!AD$76:AD$150&char(9);counta(Kenmerken!AD$3)));char(9)))
);ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!AE$3&char(9));counta(Kenmerken!AE$76:AE$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!AE$76:AE$150&char(9);counta(Kenmerken!AE$3)));char(9)))
);iferror(ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!AF$3&char(9));counta(Kenmerken!AF$76:AF$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!AF$76:AF$150&char(9);counta(Kenmerken!AF$3)));char(9)))
);"");iferror(ArrayFormula(
transpose(split(rept(concatenate(Kenmerken!AG$3&char(9));counta(Kenmerken!AG$76:AG$150));char(9)))
&" "&
transpose(split(concatenate(rept(Kenmerken!AG$76:AG$150&char(9);counta(Kenmerken!AG$3)));char(9)))
);"")};"(.*)Gewicht(.*)";"");"(.*)EAN(.*)";"");"(.*)Hoogte(.*)";"");"(.*)breedte(.*)";"");"(.*)meegeleverd(.*)";"");"(.*)Uitzonderingen fabrieksgarantie(.*)";"");"(.*)Garantietype(.*)";"");"(.*)Fabrieksgarantie(.*)";"");"(.*)lengte(.*)";"");"(.*)Afmetingen(.*)";"");"(.*)Categorieën(.*)";"");"(.*)Breedte(.*)";"");"(.*)Serie(.*)";"")))
See screenshot for how the situation is now:
Here, "Materiaal" and "Metaal" should be splitted.
Example file: https://docs.google.com/spreadsheets/d/1gSRb_t1dxEWiPcFYpBuxvsqTtktuu4nLHQotMYplwPY/edit#gid=0

I found the answer in the meantime:
The formula to get this:
={ARRAYFORMULA(Kenmerken!F3&"š"&Kenmerken!F4:F51);ARRAYFORMULA(Kenmerken!G3&"š"&Kenmerken!G4:G51);ARRAYFORMULA(Kenmerken!H3&"š"&Kenmerken!H4:H51);ARRAYFORMULA(Kenmerken!I3&"š"&Kenmerken!I4:I51);ARRAYFORMULA(Kenmerken!J3&"š"&Kenmerken!J4:J51);ARRAYFORMULA(Kenmerken!K3&"š"&Kenmerken!K4:K51);ARRAYFORMULA(Kenmerken!L3&"š"&Kenmerken!L4:L51);ARRAYFORMULA(Kenmerken!M3&"š"&Kenmerken!M4:M51);ARRAYFORMULA(Kenmerken!N3&"š"&Kenmerken!N4:N51);ARRAYFORMULA(Kenmerken!O3&"š"&Kenmerken!O4:O51);ARRAYFORMULA(Kenmerken!P3&"š"&Kenmerken!P4:P51);ARRAYFORMULA(Kenmerken!Q3&"š"&Kenmerken!Q4:Q51);ARRAYFORMULA(Kenmerken!R3&"š"&Kenmerken!R4:R51);ARRAYFORMULA(Kenmerken!S3&"š"&Kenmerken!S4:S51);ARRAYFORMULA(Kenmerken!T3&"š"&Kenmerken!T4:T51);ARRAYFORMULA(Kenmerken!U3&"š"&Kenmerken!U4:U51);ARRAYFORMULA(Kenmerken!V3&"š"&Kenmerken!V4:V51);ARRAYFORMULA(Kenmerken!W3&"š"&Kenmerken!W4:W51);ARRAYFORMULA(Kenmerken!X3&"š"&Kenmerken!X4:X51);ARRAYFORMULA(Kenmerken!Y3&"š"&Kenmerken!Y4:Y51);ARRAYFORMULA(Kenmerken!Z3&"š"&Kenmerken!Z4:Z51)}

Related

ActiveRecord difference between connection_pool.with_connection and connection.execute

In a rails application, using ActiveRecord, what's the difference betweeen:
ActiveRecord::Base.connection_pool.with_connection{
|con| con.execute(
"SELECT 1 FROM foo"
)}
And
ActiveRecord::Base.connection.execute{
|con| con.execute(
"SELECT 1 FROM foo"
)}

Which approach is better in terms of performance to modify a hash, is it `Hash#merge` or double splat operator?

From the below example, which approach is better in terms of performance?
h = {a: 1, b: 2}
{**h, c: 3} => {:a=>1, :b=>2, :c=>3}
# or
h.merge(c: 3) => {:a=>1, :b=>2, :c=>3}
Basic benchmarking
require 'benchmark/ips'
Benchmark.ips do |x|
x.config(:time => 10, :warmup => 2)
h = {a: 1, b: 2}
x.report("splat") {{**h, c: 3}}
x.report("merge") {h.merge(c: 3)}
x.compare!
end
suggests that merge is faster, for example
Warming up --------------------------------------
splat 243.017k i/100ms
merge 315.349k i/100ms
Calculating -------------------------------------
splat 3.388M (±11.8%) i/s - 33.293M in 10.005951s
merge 4.721M (±12.5%) i/s - 46.356M in 10.037133s
Comparison:
merge: 4720869.7 i/s
splat: 3388413.3 i/s - 1.39x (± 0.00) slower

CSV::MalformedCSVError: New line must be <"\n\r">

Trying to parse this file with Ruby CSV.
https://www.sec.gov/files/data/broker-dealers/company-information-about-active-broker-dealers/bd070219.txt
However, I am getting an error.
CSV.open(file_name, "r", { :col_sep => "\t", :row_sep => "\n\r" }).each do |row|
puts row
end
CSV::MalformedCSVError: New line must be <"\n\r"> not <"\r"> in line
1.
Windows row_sep is "\r\n", not "\n\r". However this CSV is malformed. Looking at it using a hex editor it appears to be using "\r\r\n".
It's tab-delimited.
In addition it is not using proper quoting, line 247 has 600 "B" STREET STE. 2204, so you need to turn off quote characters.
quote_char: nil, col_sep: "\t", row_sep: "\r\r\n"
There's an extra tab on the end, each line ends with \t\r\r\n. You can also look at it as using a row_sep of "\r\n" with an extra \r field.
quote_char: nil, col_sep: "\t", row_sep: "\r\n"
Or you can view it as having a row_sep of \t\r\r\n and no extra field.
quote_char: nil, col_sep: "\t", row_sep: "\t\r\r\n"
Either way, it's a mess.
I used a hex editor to look at the file as text and raw data side by side. This let me see what's truly at the end of the line.
87654321 0011 2233 4455 6677 8899 aabb ccdd eeff 0123456789abcdef
00000000: 3030 3030 3030 3139 3034 0941 4252 4148 0000001904.ABRAH
00000010: 414d 2053 4543 5552 4954 4945 5320 434f AM SECURITIES CO
00000020: 5250 4f52 4154 494f 4e09 3030 3832 3934 RPORATION.008294
00000030: 3532 0933 3732 3420 3437 5448 2053 5452 52.3724 47TH STR
00000040: 4545 5420 4354 2e20 4e57 0920 0947 4947 EET CT. NW. .GIG
00000050: 2048 4152 424f 5209 5741 0939 3833 3335 HARBOR.WA.98335
00000060: 090d 0d0a 3030 3030 3030 3233 3033 0950 ....0000002303.P
^^^^^^^^^
Hex 09 0d 0d 0a is \t\r\r\n.
Alternatively, you can print the lines with p and any invisible characters will be revealed.
f = File.open(file_name)
p f.readline
"0000001904\tABRAHAM SECURITIES CORPORATION\t00829452\t3724 47TH STREET CT. NW\t \tGIG HARBOR\tWA\t98335\t\r\r\n"
Use :row_sep => :auto instead of :row_sep => "\n\r":
CSV.open(file_name, "r", { :col_sep => "\t", :row_sep => :auto }).each do |row|
puts row
end

How to debug msgpack serialisation issue in Google Cloud Dataflow job?

I have a Google Cloud Dataflow job with which I would like to extract named entities from a text using a specific spacy model neural coref.
Running the extraction without beam I can extract entities but when I try to run it with the DirectRunner the job fails due to a serialisation error from msgpack. I am not sure how to proceed in debugging this problem.
My requirements are quite barebones with requirements of:
apache-beam[gcp]==2.4
spacy==2.0.12
ujson==1.35
The issue might be something related to how spacy and beam are interplaying as the stacktrace shows spacy spouting out some of its methods which it shouldn't be doing.
Weird spacy log behaviour from stacktrace:
T4: <class 'entity.extract_entities.EntityExtraction'>
# T4
D2: <dict object at 0x1126c0398>
T4: <class 'spacy.lang.en.English'>
# T4
D2: <dict object at 0x1126b54b0>
D2: <dict object at 0x1126d1168>
F2: <function is_alpha at 0x11266d320>
# F2
F2: <function is_ascii at 0x112327c08>
# F2
F2: <function is_digit at 0x11266d398>
# F2
F2: <function is_lower at 0x11266d410>
# F2
F2: <function is_punct at 0x112327b90>
# F2
F2: <function is_space at 0x11266d488>
# F2
F2: <function is_title at 0x11266d500>
# F2
F2: <function is_upper at 0x11266d578>
# F2
F2: <function like_url at 0x11266d050>
# F2
F2: <function like_num at 0x110d55140>
# F2
F2: <function like_email at 0x112327f50>
# F2
Fu: <functools.partial object at 0x11266c628>
F2: <function _create_ftype at 0x1070af500>
# F2
T1: <type 'functools.partial'>
F2: <function _load_type at 0x1070af398>
# F2
# T1
F2: <function is_stop at 0x11266d5f0>
# F2
D2: <dict object at 0x1126b7168>
T4: <type 'set'>
# T4
# D2
# Fu
F2: <function is_oov at 0x11266d668>
# F2
F2: <function is_bracket at 0x112327cf8>
# F2
F2: <function is_quote at 0x112327d70>
# F2
F2: <function is_left_punct at 0x112327de8>
# F2
F2: <function is_right_punct at 0x112327e60>
# F2
F2: <function is_currency at 0x112327ed8>
# F2
Fu: <functools.partial object at 0x110d49ba8>
F2: <function _get_attr_unless_lookup at 0x1106e26e0>
# F2
F2: <function lower at 0x11266d140>
# F2
D2: <dict object at 0x112317c58>
# D2
D2: <dict object at 0x110e38168>
# D2
D2: <dict object at 0x112669c58>
# D2
# Fu
F2: <function word_shape at 0x11266d0c8>
# F2
F2: <function prefix at 0x11266d1b8>
# F2
F2: <function suffix at 0x11266d230>
# F2
F2: <function get_prob at 0x11266d6e0>
# F2
F2: <function cluster at 0x11266d2a8>
# F2
F2: <function _return_en at 0x11266f0c8>
# F2
# D2
B2: <built-in function unpickle_vocab>
# B2
T4: <type 'spacy.strings.StringStore'>
# T4
My current hypothesis is that perhaps there is some problem with my setup.py but I am not sure what is causing the issue currently.
The full stacktrace is:
/Users/chris/coref_entity_extraction/venv/lib/python2.7/site-packages/msgpack_numpy.py:183: DeprecationWarning: encoding is deprecated, Use raw=False instead.
return _unpackb(packed, **kwargs)
/Users/chris/coref_entity_extraction/venv/lib/python2.7/site-packages/msgpack_numpy.py:132: DeprecationWarning: encoding is deprecated.
use_bin_type=use_bin_type)
T4: <class 'entity.extract_entities.EntityExtraction'>
# T4
D2: <dict object at 0x1126c0398>
T4: <class 'spacy.lang.en.English'>
# T4
D2: <dict object at 0x1126b54b0>
D2: <dict object at 0x1126d1168>
F2: <function is_alpha at 0x11266d320>
# F2
F2: <function is_ascii at 0x112327c08>
# F2
F2: <function is_digit at 0x11266d398>
# F2
F2: <function is_lower at 0x11266d410>
# F2
F2: <function is_punct at 0x112327b90>
# F2
F2: <function is_space at 0x11266d488>
# F2
F2: <function is_title at 0x11266d500>
# F2
F2: <function is_upper at 0x11266d578>
# F2
F2: <function like_url at 0x11266d050>
# F2
F2: <function like_num at 0x110d55140>
# F2
F2: <function like_email at 0x112327f50>
# F2
Fu: <functools.partial object at 0x11266c628>
F2: <function _create_ftype at 0x1070af500>
# F2
T1: <type 'functools.partial'>
F2: <function _load_type at 0x1070af398>
# F2
# T1
F2: <function is_stop at 0x11266d5f0>
# F2
D2: <dict object at 0x1126b7168>
T4: <type 'set'>
# T4
# D2
# Fu
F2: <function is_oov at 0x11266d668>
# F2
F2: <function is_bracket at 0x112327cf8>
# F2
F2: <function is_quote at 0x112327d70>
# F2
F2: <function is_left_punct at 0x112327de8>
# F2
F2: <function is_right_punct at 0x112327e60>
# F2
F2: <function is_currency at 0x112327ed8>
# F2
Fu: <functools.partial object at 0x110d49ba8>
F2: <function _get_attr_unless_lookup at 0x1106e26e0>
# F2
F2: <function lower at 0x11266d140>
# F2
D2: <dict object at 0x112317c58>
# D2
D2: <dict object at 0x110e38168>
# D2
D2: <dict object at 0x112669c58>
# D2
# Fu
F2: <function word_shape at 0x11266d0c8>
# F2
F2: <function prefix at 0x11266d1b8>
# F2
F2: <function suffix at 0x11266d230>
# F2
F2: <function get_prob at 0x11266d6e0>
# F2
F2: <function cluster at 0x11266d2a8>
# F2
F2: <function _return_en at 0x11266f0c8>
# F2
# D2
B2: <built-in function unpickle_vocab>
# B2
T4: <type 'spacy.strings.StringStore'>
# T4
Traceback (most recent call last):
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/Users/chris/coref_entity_extraction/main.py", line 29, in <module>
run()
File "/Users/chris/coref_entity_extraction/main.py", line 24, in run
entities = records | 'ExtractEntities' >> beam.ParDo(EntityExtraction())
File "/Users/chris/coref_entity_extraction/venv/lib/python2.7/site-packages/apache_beam/transforms/core.py", line 784, in __init__
super(ParDo, self).__init__(fn, *args, **kwargs)
File "/Users/chris/coref_entity_extraction/venv/lib/python2.7/site-packages/apache_beam/transforms/ptransform.py", line 638, in __init__
self.fn = pickler.loads(pickler.dumps(self.fn))
File "/Users/chris/coref_entity_extraction/venv/lib/python2.7/site-packages/apache_beam/internal/pickler.py", line 204, in dumps
s = dill.dumps(o)
File "/Users/chris/coref_entity_extraction/venv/lib/python2.7/site-packages/dill/dill.py", line 259, in dumps
dump(obj, file, protocol, byref, fmode, recurse)#, strictio)
File "/Users/chris/coref_entity_extraction/venv/lib/python2.7/site-packages/dill/dill.py", line 252, in dump
pik.dump(obj)
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/pickle.py", line 425, in save_reduce
save(state)
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/Users/chris/coref_entity_extraction/venv/lib/python2.7/site-packages/apache_beam/internal/pickler.py", line 172, in new_save_module_dict
return old_save_module_dict(pickler, obj)
File "/Users/chris/coref_entity_extraction/venv/lib/python2.7/site-packages/dill/dill.py", line 841, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/pickle.py", line 655, in save_dict
self._batch_setitems(obj.iteritems())
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/pickle.py", line 692, in _batch_setitems
save(v)
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/pickle.py", line 425, in save_reduce
save(state)
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/Users/chris/coref_entity_extraction/venv/lib/python2.7/site-packages/apache_beam/internal/pickler.py", line 172, in new_save_module_dict
return old_save_module_dict(pickler, obj)
File "/Users/chris/coref_entity_extraction/venv/lib/python2.7/site-packages/dill/dill.py", line 841, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/pickle.py", line 655, in save_dict
self._batch_setitems(obj.iteritems())
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/pickle.py", line 687, in _batch_setitems
save(v)
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/pickle.py", line 401, in save_reduce
save(args)
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/pickle.py", line 568, in save_tuple
save(element)
File "/Users/chris/.pyenv/versions/2.7.14/lib/python2.7/pickle.py", line 306, in save
rv = reduce(self.proto)
File "vectors.pyx", line 108, in spacy.vectors.Vectors.__reduce__
File "vectors.pyx", line 409, in spacy.vectors.Vectors.to_bytes
File "/Users/chris/coref_entity_extraction/venv/lib/python2.7/site-packages/spacy/util.py", line 485, in to_bytes
serialized[key] = getter()
File "vectors.pyx", line 404, in spacy.vectors.Vectors.to_bytes.serialize_weights
File "/Users/chris/coref_entity_extraction/venv/lib/python2.7/site-packages/msgpack_numpy.py", line 165, in packb
return Packer(**kwargs).pack(o)
File "msgpack/_packer.pyx", line 282, in msgpack._cmsgpack.Packer.pack
File "msgpack/_packer.pyx", line 288, in msgpack._cmsgpack.Packer.pack
File "msgpack/_packer.pyx", line 285, in msgpack._cmsgpack.Packer.pack
File "msgpack/_packer.pyx", line 232, in msgpack._cmsgpack.Packer._pack
File "msgpack/_packer.pyx", line 279, in msgpack._cmsgpack.Packer._pack
TypeError: can not serialize 'buffer' object
I have no idea about how to debug this issue with beam. To reproduce the whole issue I have setup a repo with instructions about how to set everything: https://github.com/swartchris8/coref_barebones
Are you able to run the same code from a regular Python program (not from a Beam DoFn) ?
If not, check whether you are storing any non-serializable state in a Beam DoFn (or any other function that will be serialized by Beam). This prevents Beam runners from serializing these functions (to be sent to workers) hence should be avoided.
In the end I got rid of the above the issue by changing the package versions installed. I do think it debugging the beam setup process is quite painful though my approach was just to manually try different package permutations.

rails prawn undefined method rowspan

I get the following error for the code below: undefined method 'rowspan=' for #<Prawn::Table::Cell::Text:0x9477540>
table [
[{content: "<b>control</b>", rowspan: 2},
{content: "time", colspan: 2},
"order",
{content: "count", colspan: 6}]
], cell_style: {size: 10, inline_format: true}
I followed the prawn manual, and can not see what I did wrong. I am using prawn 0.12.0.
According to the Prawn google group, colspan and rowspan were not introduced until a later release. https://groups.google.com/forum/#!searchin/prawn-ruby/rowspan/prawn-ruby/G-QHFUZheMI/3a4pNnLur0EJ
Updating to the lastest master gem from github worked for me:
git clone https://github.com/prawnpdf/prawn.git
Create a directory to test the manual example.
Run bundle init to create a Gemfile in that directory and add this line
gem 'prawn', :path=>'/path/to/your/local/prawn/git/clone/dir'
Create the span_example.rb file from the manual, and set it up to use bundler Gemfile like this:
require 'rubygems'
require 'bundler/setup'
Bundler.require
pdf = Prawn::Document.generate('span_example.pdf') do
table([
["A", {content: "2x1", colspan: 2}, "B"],
[{content: "1x2", rowspan: 2}, "C", "D", "E"],
[{content: "2x2", colspan: 2, :rowspan => 2}, "F"],
["G", "H"]
])
end
Then run
bundle install
ruby span_example.rb
open span_example.pdf
Viola!

Resources