mxnet cpu memory leak when running inference on GPT2 model

mxnet cpu memory leak when running inference on GPT2 model - memory

I am testing GPT2.
This code takes a question and predicts the next word.
def chat(model, vocab, sentence):
q = sentence
if q == 'quit':
return
q_tok = tok(q)
a = ''
a_tok = []
break_check_idx=0
prev_gen = ''
while 1:
input_ids = mx.nd.array([vocab[U_TKN]] + vocab[q_tok] +
vocab[EOS, SENT] +
vocab[EOS, S_TKN] +
vocab[a_tok]).expand_dims(axis=0) # <--- Here
pred = model(input_ids.as_in_context(ctx)) # <--- Here
gen = vocab.to_tokens(
mx.nd.argmax(
pred,
axis=-1).squeeze().astype('int').asnumpy().tolist())[-1] # <--- Here
if gen == EOS:
break
if prev_gen == gen:
break_check_idx += 1
if break_check_idx == 5:
break_check_idx = 0
prev_gen = ''
return '๑°⌓°๑ ...?'
prev_gen = gen
a += gen.replace('▁', ' ')
a_tok = tok(a)
return a.strip()
As a result of tracking memory using several tools, memory is allocated from the indicated area.
No matter how much I search, I do not know the cause. Where is the memory leaking?
I tried del, mx.nd.waitall(), gc.collect(), ctx.empty_cache(), jemalloc all.

Related

Precision and recall missunderstanding

In pycocotools in cocoeval.py sctipt there is COCOeval class and in this class there is accumulate function for calculating Precision and Recall. Does anyone know what is this npig variable? Is this negative-positive or?
Because I saw this formula for recall: Recall = (True Positive)/(True Positive + False Negative)
Can I just use this precision and recall variable inside dictionary self.eval to get precision and recall of my model which I'm testing, and plot a precision-recall curve?
And the variable scores is this F1 score?
Because I'm not very well understand this T,R,K,A,M what is happening with this.
How can I print precision and recall in terminal?
def accumulate(self, p = None):
'''
Accumulate per image evaluation results and store the result in self.eval
:param p: input params for evaluation
:return: None
'''
print('Accumulating evaluation results...')
tic = time.time()
if not self.evalImgs:
print('Please run evaluate() first')
# allows input customized parameters
if p is None:
p = self.params
p.catIds = p.catIds if p.useCats == 1 else [-1]
T = len(p.iouThrs)
R = len(p.recThrs)
K = len(p.catIds) if p.useCats else 1
A = len(p.areaRng)
M = len(p.maxDets)
precision = -np.ones((T,R,K,A,M)) # -1 for the precision of absent categories
recall = -np.ones((T,K,A,M))
scores = -np.ones((T,R,K,A,M))
# create dictionary for future indexing
_pe = self._paramsEval
catIds = _pe.catIds if _pe.useCats else [-1]
setK = set(catIds)
setA = set(map(tuple, _pe.areaRng))
setM = set(_pe.maxDets)
setI = set(_pe.imgIds)
# get inds to evaluate
k_list = [n for n, k in enumerate(p.catIds) if k in setK]
m_list = [m for n, m in enumerate(p.maxDets) if m in setM]
a_list = [n for n, a in enumerate(map(lambda x: tuple(x), p.areaRng)) if a in setA]
i_list = [n for n, i in enumerate(p.imgIds) if i in setI]
I0 = len(_pe.imgIds)
A0 = len(_pe.areaRng)
# retrieve E at each category, area range, and max number of detections
for k, k0 in enumerate(k_list):
Nk = k0*A0*I0
for a, a0 in enumerate(a_list):
Na = a0*I0
for m, maxDet in enumerate(m_list):
E = [self.evalImgs[Nk + Na + i] for i in i_list]
E = [e for e in E if not e is None]
if len(E) == 0:
continue
dtScores = np.concatenate([e['dtScores'][0:maxDet] for e in E])
# different sorting method generates slightly different results.
# mergesort is used to be consistent as Matlab implementation.
inds = np.argsort(-dtScores, kind='mergesort')
dtScoresSorted = dtScores[inds]
dtm = np.concatenate([e['dtMatches'][:,0:maxDet] for e in E], axis=1)[:,inds]
dtIg = np.concatenate([e['dtIgnore'][:,0:maxDet] for e in E], axis=1)[:,inds]
gtIg = np.concatenate([e['gtIgnore'] for e in E])
npig = np.count_nonzero(gtIg==0 )
if npig == 0:
continue
tps = np.logical_and( dtm, np.logical_not(dtIg) )
fps = np.logical_and(np.logical_not(dtm), np.logical_not(dtIg) )
tp_sum = np.cumsum(tps, axis=1).astype(dtype=np.float)
fp_sum = np.cumsum(fps, axis=1).astype(dtype=np.float)
for t, (tp, fp) in enumerate(zip(tp_sum, fp_sum)):
tp = np.array(tp)
fp = np.array(fp)
nd = len(tp)
rc = tp / npig
pr = tp / (fp+tp+np.spacing(1))
q = np.zeros((R,))
ss = np.zeros((R,))
if nd:
recall[t,k,a,m] = rc[-1]
else:
recall[t,k,a,m] = 0
# numpy is slow without cython optimization for accessing elements
# use python array gets significant speed improvement
pr = pr.tolist(); q = q.tolist()
for i in range(nd-1, 0, -1):
if pr[i] > pr[i-1]:
pr[i-1] = pr[i]
inds = np.searchsorted(rc, p.recThrs, side='left')
try:
for ri, pi in enumerate(inds):
q[ri] = pr[pi]
ss[ri] = dtScoresSorted[pi]
except:
pass
precision[t,:,k,a,m] = np.array(q)
scores[t,:,k,a,m] = np.array(ss)
self.eval = {
'params': p,
'counts': [T, R, K, A, M],
'date': datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
'precision': precision,
'recall': recall,
'scores': scores,
}
toc = time.time()
print('DONE (t={:0.2f}s).'.format( toc-tic))

A better way on improving my roman numeral decoder

Quick explanation, I have recently started using codewars to further improve my programming skills and my first challenge was to make a roman numeral decoder, I went through many versions because I wasnt satisfied with what I had, So I am asking if there is an easier way of handling all the patterns that roman numerals have, for example I is 1 but if I is next to another number it takes it away for example V = 5 but IV = 4.
here is my CODE:
function Roman_Numerals_Decoder (roman)
local Dict = {I = 1, V = 5, X = 10, L = 50, C = 100, D = 500, M = 1000}
local number = 0
local i = 1
while i < #roman + 1 do
local letter = roman:sub(i,i) -- Gets the current character in the string roman
if roman:sub(i,i) == "I" and roman:sub(i + 1,i + 1) ~= "I" and roman:sub(i + 1,i + 1) ~= "" then -- Checks for the I pattern when I exists and next isnt I
number = number + (Dict[roman:sub(i +1,i + 1)] - Dict[roman:sub(i,i)]) -- Taking one away from the next number
i = i + 2 -- Increase the counter
else
number = number + Dict[letter] -- Adds the numbers together if no pattern is found, currently checking only I
i = i + 1
end
end
return number
end
print(Roman_Numerals_Decoder("MXLIX")) -- 1049 = MXLIX , 2008 = MMVIII
at the moment I am trying to get 1049 (MXLIX) to work but I am getting 1069, obviously I am not following a rule and I feel like its more wrong then it should be because usually if its not correct its 1 or 2 numbers wrong.

The algorithm is slightly different: you need to consider subtraction when the previous character has less weight than the next one.
function Roman_Numerals_Decoder (roman)
local Dict = {I = 1, V = 5, X = 10, L = 50, C = 100, D = 500, M = 1000}
local num = 0
local i = 1
for i=1, #roman-1 do
local letter = roman:sub(i,i) -- Gets the current character in the string roman
local letter_p = roman:sub(i+1,i+1)
if (Dict[letter] < Dict[letter_p]) then
num = num - Dict[letter] -- Taking one away from the next number
print("-",Dict[letter],num)
else
num = num + Dict[letter] -- Adds the numbers together if no pattern is found, currently checking only I
print("+",Dict[letter],num)
end
end
num = num + Dict[roman:sub(-1)];
print("+",Dict[roman:sub(-1)], num)
return num
end
print(Roman_Numerals_Decoder("MXLIX")) -- 1049 = MXLIX , 2008 = MMVIII

Why does "array[index]" return "nil"?

this problem seems very simple but I cannot find a solution for it, actually I don't even know what is wrong!!!
So basically I have this Lua code:
io.write("\nPlease provide the message to be decyphered: ")
message = io.read()
seq = #message
ffib = {}
a = 0
b = 1
c = a + b
fib = 0
while c < (seq - 10) do
fib = fib + 1
ffib[fib] = c
a = b
b = c
c = a + b
end
decyphered = ""
for i = 1,seq do
decyphered = table.concat{decyphered, message:sub(ffib[i],ffib[i])}
end
io.write("\nDecyphered message: ", decyphered, "\n\n")
and trying to access ffib[fib] returns nil. So trying to message:sub(ffib[i]... later throws an error.
When I try accessing ffib's values manually, ffib[1] for example, it works alright, it's only when trying to access it with an iterator that it screws up.
Somewhere else in my code I have this:
io.write("\nPlease provide the message to be cyphered: ")
message = io.read()
cyphered = ""
seq = #message
ffib = {}
a = 0
b = 1
c = a + b
for fib = 1,seq do
ffib[fib] = c
a = b
b = c
c = a + b
end
which is basically the same thing but instead of using a while loop, it uses a for loop, and it works just fine!
Please help me solve this I am going insane.

Alright, I figured it out!
io.write("\nPlease provide the message to be decyphered: ")
message = io.read()
seq = #message
ffib = {}
a = 0
b = 1
c = a + b
fib = 0
while c < (seq - 10) do
fib = fib + 1
ffib[fib] = c
a = b
b = c
c = a + b
end
decyphered = ""
for i = 1,seq do <--------------
decyphered = table.concat{decyphered, message:sub(ffib[i],ffib[i])}
end
io.write("\nDecyphered message: ", decyphered, "\n\n")
I was using the wrong variable in the for loop, so it was looping through the entire message length instead of the fibonacci array length, the "nil" values were indexes out of bounds!
To correct this, I simply changed seq for #ffib in that For Loop, marked by an arrow.
Thanks everyone who tried to help me anyway!

this part doesn't make much sense I think
while c < (seq - 10) do
Why the minus 10? ffib will have less entries than seq while in the loop after that you expect a value in ffib from 1 to seq
And even if you change it to
while c < seq do
Then there still won't be enough for messages larger than length 2.
If anything, you might want to do
while c < (seq + 10) do
But even there you will run into an issue when the message is a certain length.
I'm also not familiar with that algorithm, but it looks pretty weird to me and I wonder what it actually establishes

Simstudy package duplicate keys error and variables referenced not previously defined error

I tried running the following code and encountered several errors with the simstudy package.
library(simstudy)
clusterDef <- defData(varname = "u_3", dist = "normal", formula = 0,
variance = 25.77, id="clus") #cluster-level random effect
clusterDef <- defData(clusterDef, varname = "error", dist = "normal", formula = 0,
variance = 38.35) #error termeriod
clusterDef <- defData(clusterDef, varname = "ind", dist = "nonrandom",
formula = 25) #individuals per cluster
#Generate individual-level random effect and treatment variable
indDef <- defDataAdd(varname = "u_2", dist = "normal", formula = 0,
variance = 120.62)
#Generate clusters of data
set.seed(12345)
cohortsw <- genData(3, clusterDef)
cohortswTm <- addPeriods(cohortsw, nPeriods = 6, idvars = "clus", perName = "period")
cohortswTm <- trtStepWedge(cohortswTm, "clus", nWaves = 3, lenWaves = 1, startPer = 1, grpName = "trt")
cohortswTm <- genCluster(cohortswTm, cLevelVar = "clus", numIndsVar = "ind", level1ID = "id")
Error in vecseq(f__, len__, if (allow.cartesian || notjoin ||
!anyDuplicated(f__, : Join results in 2700 rows; more than 468 =
nrow(x)+nrow(i). Check for duplicate key values in i each of which
join to the same group in x over and over again. If that's ok, try
by=.EACHI to run j for each group to avoid the large allocation. If
you are sure you wish to proceed, rerun with allow.cartesian=TRUE.
Otherwise, please search for this error message in the FAQ, Wiki,
Stack Overflow and data.table issue tracker for advice.
cohortswTm <- addColumns(indDef, cohortswTm)
#Define coefficients for time as a categorical variable
timecoeff1 <- -5.42
timecoeff2 <- -5.72
timecoeff3 <- -7.03
timecoeff4 <- -6.13
timecoeff5 <- -9.13
#Generate outcome y
y <- defDataAdd(varname = "Y", formula = "17.87 + 5.0*trt + timecoeff1*I(period == 1) + timecoeff2*I(period == 2) + timecoeff3*I(period == 3) + timecoeff4*I(period == 4) + timecoeff5*I(period == 5) + u_3 + u_2 + error", dist = "normal")
#Add outcome to dataset
cohortswTm <- addColumns(y, cohortswTm)
Error: Variable(s) referenced not previously defined: timecoeff1,
timecoeff2, timecoeff3, timecoeff4, timecoeff5
Does anybody know why I am getting the errors that were highlighted above? How would I fix the code to prevent them from occuring?
Any help is much appreciated.

The first error is generated because you are trying to create individual level data within each cluster, but each cluster appears repeatedly (over 6 periods). genCluster is expecting that cLevelVar is a unique id. In this case, you can generate 6 individuals per cluster per time period by modifying the genCluster command to be
cohortswTm <- genCluster(cohortswTm, cLevelVar = "timeID",
numIndsVar = "ind", level1ID = "id")
This code creates a "closed" cohort, individuals are observed only in a single period. Generating an open cohort, where individuals might be observed over time as well, is a bit more involved, and is described here.
The second error is generated because simstudy data definitions can only include variables that have been defined in the context of the data definition. So, any constants need to be in the formula. (The formula itself can be updated using updateDef and updateDefAdd if you want to explore the effects of different covariate levels.)
This is how y should be defined:
y <- defDataAdd(varname = "Y", formula = "17.87 + 5.0*trt -
5.42*I(period == 1) - 5.72*I(period == 2) - 7.03*I(period == 3) -
6.13*I(period == 4) - 9.13*I(period == 5) + u_3 + u_2 + error",
dist = "normal")

LDA genism is using only one core out of 16

A huge problem with training with LdaMulticore. It takes 2.5h to get only 25 topics. Whilst only one core is active, and I have 16 of them on Amazon EC2.
How can I optimize this?
Something is bottlenecking this process... When I take a look at processes only one core is active, but after some time all cores get active for a couple of seconds, then again one core.
numberTopics = 25 #Number of topics
model_gensim = LdaMulticore(num_topics=numberTopics,
id2word=dictionary,
iterations=10,
passes=1,
chunksize=50,
eta='auto',
workers=12)
perp_gensim = []
times_gensim = []
i=0
max_it = 5
min_prep = np.inf
start = time()
for _ in tqdm_notebook(range(100)):
model_gensim.update(corpus)
tmp = np.exp(-1 * model_gensim.log_perplexity(corpus))
perp_gensim.append(tmp)
times_gensim.append(time() - start)
if(tmp<min_prep):
min_prep = tmp;
i = 0
else:
i = i + 1;
if (i==max_it):
break
model_gensim.save('results/model_genism/model_genism.model')
with open('results/model_genism/perp_gensim.pickle', 'wb') as f:
pickle.dump(perp_gensim, f)
with open('results/model_genism/time_gensim.pickle', 'wb') as f:
pickle.dump(times_gensim, f)
for i, topic in enumerate(model_gensim.get_topics().argsort(axis=1)[:, -10:][:, ::-1], 1):
print('Topic {}: {}'.format(i, ' '.join([vocabulary[id] for id in topic])))

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

mxnet cpu memory leak when running inference on GPT2 model - memory

Related

Precision and recall missunderstanding

A better way on improving my roman numeral decoder

Why does "array[index]" return "nil"?

Simstudy package duplicate keys error and variables referenced not previously defined error

LDA genism is using only one core out of 16

Categories

Resources