iOS Swift Mi Scale 2 Bluetooth Get Weight - ios

I am writing an app that can get weight measurement from Xiaomi mi scale 2. After reading all available uuid's, only "181B" connection, specifically "2A9C" characteristic (Body weight measurement in bluetooth gatt) gets notifications.
Value data is [2, 164, 178, 7, 1, 1, 2, 58, 56, 253, 255, 240, 60]. Only last two values vary, the rest is time and date, witch is not set currently (253, 255 are zeroes when the weight varies on the scale until it stabilizes).
Can someone help me get only persons weight, should i be getting data maybe in a different way, from other uuid's (like custom ones: 00001530-0000-3512-2118-0009AF100700, 00001542-0000-3512-2118-0009AF100700), and how do i retrieve them.
Correct answer by Paulw11: You need to look at bit 0 of the first byte to determine if the weight is in imperial or SI; the bit is 0 so the data is SI. The to get the weight, convert the last two bytes to a 16 bit integer (60*256+240 = 15,600) and multiply by 0.005 = 78kg

In my case, it was a little different:
I got data like this [207, 0, 0, 178, 2, 0, 0, 0, 0, 0, 127] (6.9 KG) and the solution is:
let bytesArray = [207, 0, 0, 178, 2, 0, 0, 0, 0, 0, 127]
let weight = (( bytesArray[4] * 256 + bytesArray[3] ) * 10.0) / 1000
And now I have my 6.9 kg.

I was using Mi Smart scale and i had the following byte array.
02-A4-B2-07-02-13-06-33-35-FD-FF-EC-09" received - 12.7 KG
02-A4-B2-07-02-13-06-3B-17-FD-FF-C8-3C" - 77.8 KG
I used the last two bytes to get the weights in KG.
(09*256 + EC)/200 = 12.7
(3C*256+C8)/200 = 77.8
My byte array was 13 bytes long.
bytes 0 and 1: control bytes
bytes 2 and 3: year
byte 4: month
byte 5: day
byte 6: hours
byte 7: minutes
byte 8: seconds
bytes 9 and 10: impedance
bytes 11 and 12: weight (divide by 100 for pounds and catty, divide by 200 for kilograms)

Related

Z3 vZ - Adding constraint improves optimum

I'm new to using Z3 and am trying to model an ILP, which I have already successfully done using the MILP Solver PuLP. I now implemented the same objective function (which is to be minimized) and the same constraints in Z3 and am experiencing strange behaviour. Mainly, that adding a constraint decreases the minimum.
My question is: How can that be? Is it a bug or can it be explained somehow?
More detail, in case needed:
I'm trying to solve a Teacher Assignment Problem. Courses are scheduled before a year and the teachers get the list of the courses. They then can select which courses they want to teach, with which priority they want to teach it, how many workdays (each course lasts several days) they desire to teach and a max and min number of courses they definitly want to teach. The program gets as input a list of possible teacher-assignments. A teacher-assignment is a tuple consisting of:
teacher-name
event-id
priority of teacher towards event
the distance between teacher and course
The goal of the program to find a combination of assignments that minimize:
the average relative deviation 'desired workdays <-> assigned workdays' of all teachers
the maximum relative deviation 'desired workdays <-> assigned workdays' of any teacher
the overall distance of courses to assigned teachers
the sum of priorities (higher priority means less willingness to teach)
Main Constraints:
number of teachers assigned to course must match needed amount of teachers
the number of assigned courses to a teacher must be within the specified min/max range
the courses to which a teacher is assigned may not overlap in time (a list of overlap-sets are given)
To track the average relative deviation and the maximum deviation of workdays two more 'helper-constraints' are introduced:
for each teacher: overload (delta_plus) - underload (delta_minus) = assigned workdays - desired workdays
for each teacher: delta_plus + delta_minus <= max relative deviation (DELTA)
Here you have this as Python code:
from z3 import *
def compute_optimum(a1, a2, a3, a4, worst_case_distance=0):
"""
Find the optimum solution with weights a1, a2, a3, a4
(average workday deviation, maximum workday deviation, cummulative linear distance, sum of priority 2 assignments)
Higher weight = More optimized (value minimized)
Returns all assignment-tuples which occur in the calculated optimal model.
"""
print("Powered by Z3")
print(f"\n\n\n\n ------- FINDING OPTIMUM TO WEIGHTS: a1={a1}, a2={a2}, a3={a3}, a4={a4} -------\n")
# key: assignment tuple value: z3-Bool
x = {assignment : Bool('e%i_%s' % (assignment[1], assignment[0])) for assignment in possible_assignments}
delta_plus = {teacher : Int('d+_%s' % teacher) for teacher in teachers}
delta_minus = {teacher : Int('d-_%s' % teacher) for teacher in teachers}
DELTA = Real('DELTA')
opt = Optimize()
# constraint1: number of teachers needed per event
num_available_per_event = {event : len(list(filter(lambda assignment: assignment[1] == event, possible_assignments))) for event in events}
for event in events:
num_teachers_to_assign = min(event_size[event], num_available_per_event[event])
opt.add(Sum( [If(x[assignment], 1, 0) for assignment in x.keys() if assignment[1] == event] ) == num_teachers_to_assign)
for teacher in teachers:
# constraint2: max and min number of events for each teacher
max_events = len(events)
min_events = 0
num_assigned_events = Sum( [If(x[assignment], 1, 0) for assignment in x.keys() if assignment[0] == teacher] )
opt.add(num_assigned_events >= min_events, num_assigned_events <= max_events)
# constraint3: teacher can't work in multiple overlapping events
for overlapping_events in event_overlap_sets:
opt.add(Sum( [If(x[assignment], 1, 0) for assignment in x.keys() if assignment[1] in overlapping_events and assignment[0] == teacher] ) <= 1)
# constraint4: delta (absolute over and underload of teacher)
num_teacher_workdays = Sum( [If(x[assignment], event_durations[assignment[1]], 0) for assignment in x.keys() if assignment[0] == teacher])
opt.add(delta_plus[teacher] >= 0, delta_minus[teacher] >= 0)
opt.add(delta_plus[teacher] - delta_minus[teacher] == num_teacher_workdays - desired_workdays[teacher])
# constraint5: DELTA (maximum relative deviation of wished to assigned workdays)
opt.add(DELTA >= ToReal(delta_plus[teacher] + delta_minus[teacher]) / desired_workdays[teacher])
#opt.add(DELTA <= 1) # adding this results in better optimum
average_rel_workday_deviation = Sum( [ToReal(delta_plus[teacher] + delta_minus[teacher]) / desired_workdays[teacher] for teacher in teachers]) / len(teachers)
overall_distance = Sum( [If(x[assignment], assignment[3], 0) for assignment in x.keys()])
num_prio2 = Sum( [If(x[assignment], assignment[2]-1, 0) for assignment in x.keys()])
obj_fun = opt.minimize(
a1 * average_rel_workday_deviation
+ a2 * DELTA
+ a3 * overall_distance
+ a4 * num_prio2
)
#print(opt)
if opt.check() == sat:
m = opt.model()
optimal_assignments = []
for assignment in x.keys():
if m.evaluate(x[assignment]):
optimal_assignments.append(assignment)
for teacher in teachers:
print(f"{teacher}: d+ {m.evaluate(delta_plus[teacher])}, d- {m.evaluate(delta_minus[teacher])}")
#print(m)
print("DELTA:::", m.evaluate(DELTA))
print("min value:", obj_fun.value().as_decimal(2))
return optimal_assignments
else:
print("Not satisfiable")
return []
compute_optimum(1,1,1,1)
Sample input:
teachers = ['fr', 'hö', 'pf', 'bo', 'jö', 'sti', 'bi', 'la', 'he', 'kl', 'sc', 'str', 'ko', 'ba']
events = [5, 6, 7, 8, 9, 10, 11, 12]
event_overlap_sets = [{5, 6}, {8, 9}, {10, 11}, {11, 12}, {12, 13}]
desired_workdays = {'fr': 36, 'hö': 50, 'pf': 30, 'bo': 100, 'jö': 80, 'sti': 56, 'bi': 20, 'la': 140, 'he': 5.0, 'kl': 50, 'sc': 38, 'str': 42, 'ko': 20, 'ba': 20}
event_size = {5: 2, 6: 2, 7: 2, 8: 3, 9: 2, 10: 2, 11: 3, 12: 2}
event_durations = {5: 5.0, 6: 5.0, 7: 5.0, 8: 16, 9: 7.0, 10: 5.0, 11: 16, 12: 5.0}
# assignment: (teacher, event, priority, distance)
possible_assignments = [('he', 5, 1, 11), ('sc', 5, 1, 48), ('str', 5, 1, 199), ('ko', 6, 1, 53), ('jö', 7, 1, 317), ('bo', 9, 1, 56), ('sc', 10, 1, 25), ('ba', 11, 1, 224), ('bo', 11, 1, 312), ('jö', 11, 1, 252), ('kl', 11, 1, 248), ('la', 11, 1, 303), ('pf', 11, 1, 273), ('str', 11, 1, 228), ('kl', 5, 2, 103), ('la', 5, 2, 16), ('pf', 5, 2, 48), ('bi', 6, 2, 179), ('la', 6, 2, 16), ('pf', 6, 2, 48), ('sc', 6, 2, 48), ('str', 6, 2, 199), ('sc', 7, 2, 354), ('sti', 7, 2, 314), ('bo', 8, 2, 298), ('fr', 8, 2, 375), ('hö', 9, 2, 95), ('jö', 9, 2, 119), ('sc', 9, 2, 37), ('sti', 9, 2, 95), ('bi', 10, 2, 211), ('hö', 11, 2, 273), ('bi', 12, 2, 408), ('bo', 12, 2, 318), ('ko', 12, 2, 295), ('la', 12, 2, 305), ('sc', 12, 2, 339), ('str', 12, 2, 218)]
Output (just the delta+ and delta-):
------- FINDING OPTIMUM TO WEIGHTS: a1=1, a2=1, a3=1, a4=1 -------
fr: d+ 17, d- 37
hö: d+ 26, d- 69
pf: d+ 0, d- 25
bo: d+ 41, d- 120
jö: d+ 0, d- 59
sti: d+ 27, d- 71
bi: d+ 0, d- 15
la: d+ 0, d- 119
he: d+ 0, d- 0
kl: d+ 0, d- 50
sc: d+ 0, d- 33
str: d+ 0, d- 32
ko: d+ 0, d- 20
ba: d+ 10, d- 14
DELTA::: 19/10
min value: 3331.95?
What I observe that does not make sense to me:
often, neither delta_plus nor delta_minus for a teacher equals 0, DELTA is bigger than 1
adding constraint 'DELTA <= 1' results in a smaller objective function value, faster computation and observation 1 cannot be observed anymore
Also: the computation takes forever (although this is not the point of this)
I am happy for any sort of help!
Edit:
Like suggested by alias, changing the delta+/- variables to Real and removing the two ToReal() statements yields the desired result. If you look at the generated expressions of my sample input, there are in fact slight differences (also besides the different datatype and missing to_real statements).
For example, when looking at the constraint, which is supposed to constrain that delta_plus - delta_minus of 'fri' is equals to 16 - 36 if he works for event 8, 0 - 36 if he doesn't.
My old code using integers and ToReal-conversions produces this expression:
(assert (= (- d+_fr d-_fr) (- (+ (ite e8_fr 16 0)) 36)))
The code using Reals and no type-conversions produces this:
(assert (let ((a!1 (to_real (- (+ (ite e8_fr 16 0)) 36))))
(= (- d+_fr d-_fr) a!1)))
Also the minimization expressions are slightly different:
My old code using integers and ToReal-conversions produces this expression:
(minimize (let (
(a!1 ...)
(a!2 (...))
(a!3 (...))
)
(+ (* 1.0 (/ a!1 14.0)) (* 1.0 DELTA) a!2 a!3)))
The code using Reals and no type-conversions produces this:
(minimize (let (
(a!1 (/ ... 14.0))
(a!2 (...))
(a!3 (...))
)
(+ (* 1.0 a!1) (* 1.0 DELTA) a!2 a!3)))
Sadly I don't know really know how to read this but it seems quite the same to me.

Use a custom kernel / image filter to find a specific pattern in a 2d array

Given an image im,
>>> np.random.seed(0)
>>> im = np.random.randint(0, 100, (10,5))
>>> im
array([[44, 47, 64, 67, 67],
[ 9, 83, 21, 36, 87],
[70, 88, 88, 12, 58],
[65, 39, 87, 46, 88],
[81, 37, 25, 77, 72],
[ 9, 20, 80, 69, 79],
[47, 64, 82, 99, 88],
[49, 29, 19, 19, 14],
[39, 32, 65, 9, 57],
[32, 31, 74, 23, 35]])
what is the best way to find a specific segment of this image, for instance
>>> im[6:9, 2:5]
array([[82, 99, 88],
[19, 19, 14],
[65, 9, 57]])
If the specific combination does not exist (maybe due to noise), I would like to have a similarity measure, which searches for segments with a similar distribution and tells me for each pixel of im, how good the agreement is. For instance something like
array([[0.03726647, 0.14738364, 0.04331007, 0.02704363, 0.0648282 ],
[0.02993497, 0.04446428, 0.0772978 , 0.1805197 , 0.08999 ],
[0.12261269, 0.18046972, 0.01985607, 0.19396181, 0.13062801],
[0.03418192, 0.07163043, 0.15013723, 0.12156613, 0.06500945],
[0.00768509, 0.12685481, 0.19178985, 0.13055806, 0.12701177],
[0.19905991, 0.11637007, 0.08287372, 0.0949395 , 0.12470202],
[0.06760152, 0.13495046, 0.06344035, 0.1556691 , 0.18991421],
[0.13250537, 0.00271433, 0.12456922, 0.97 , 0.194389 ],
[0.17563869, 0.10192488, 0.01114294, 0.09023184, 0.00399753],
[0.08834218, 0.19591735, 0.07188889, 0.09617871, 0.13773224]])
The example code is python.
I think there should be a solution correlating a kernel with im. This will have the issue though, that a segment with the same value but scaled, will give a sharper response.
Template matching would be one of the ways to go about it. Of course deep learning/ML can also be used for more complicated matching.
Most image processing libraries support some sort of matching function which compares a set of 2 image - reference and the one to match. In OpenCV it returns a score which can used to determine a match. The matching method uses various functions that support scale and/or rotation invariant matching. Beware of licensing constraints in the method you plan to use.
In case the images may not always be exact, you can use standard deviation (StdDev) to allow for permissible deviation and yet classify them into buckets. Histogram matching may also be used depending on the condition of image to be matched (lighting, color can be important, unless you use specific channels). Use of histogram will avoid matching template in its entirety.
Ref for Template Matching:
OpenCV - https://docs.opencv.org/master/d4/dc6/tutorial_py_template_matching.html
SciPy - https://scikit-image.org/docs/dev/auto_examples/features_detection/plot_template.html
Thanks to banerjk for the great answer - template matching is exactly the solution!
some backup method
Considering my correlating-with-a-kernel idea, there is some progress:
When one correlates the image with the template (i.e. what I called target segment in the question), chances are high, that the most intense point in the correlated image (relative to the mean intensity) matches the template position (see im and m in the example). Seems like I am not the first, who comes up with this idea, as can be see in these lecture notes on page 39.
However, this is not always true. This method, more or less, just detects weight at the largest values in the template. In the example, im2 is constructed such, that it tricks this concept.
Maybe it gets more reliable if one applies some filter (for instance median) on the image beforehand.
I just wanted to mention it here, as it might have advantages for certain situations (it should be more performant compared to the Wikipedia-implementation of template_matching).
example
import numpy as np
from scipy import ndimage
np.random.seed(0)
im = np.random.randint(0, 100, (10,5))
t = im[6:9, 2:5]
print('t', t, sep='\n')
m = ndimage.correlate(im, t) / ndimage.correlate(im, np.ones(t.shape))
m /= np.amax(m)
print('im', im, sep='\n')
print('m', m, sep='\n')
print("this can be 'tricked', however")
im2 = im.copy()
im2[6:9, :3] = 0
im2[6,1] = 1
m2 = ndimage.correlate(im2, t) / ndimage.correlate(im2, np.ones(t.shape))
m2 /= np.amax(m2)
print('im2', im2, sep='\n')
print('m2', m2, sep='\n')
output
t
[[82 99 88]
[19 19 14]
[65 9 57]]
im
[[44 47 64 67 67]
[ 9 83 21 36 87]
[70 88 88 12 58]
[65 39 87 46 88]
[81 37 25 77 72]
[ 9 20 80 69 79]
[47 64 82 99 88]
[49 29 19 19 14]
[39 32 65 9 57]
[32 31 74 23 35]]
m
[[0.73776208 0.62161208 0.74504705 0.71202601 0.66743979]
[0.70809611 0.70617161 0.70284942 0.80653741 0.67067733]
[0.55047727 0.61675268 0.5937487 0.70579195 0.74351706]
[0.7303857 0.77147963 0.74809273 0.59136392 0.61324214]
[0.70041161 0.7717032 0.69220064 0.72463532 0.6957257 ]
[0.89696894 0.69741108 0.64136612 0.64154719 0.68621613]
[0.48509474 0.60700037 0.65812918 0.68441118 0.68835903]
[0.73802038 0.83224745 0.87301124 1. 0.92272565]
[0.72708573 0.64909142 0.54540817 0.60859883 0.52663327]
[0.72061572 0.70357846 0.61626289 0.71932261 0.75028955]]
this can be 'tricked', however
im2
[[44 47 64 67 67]
[ 9 83 21 36 87]
[70 88 88 12 58]
[65 39 87 46 88]
[81 37 25 77 72]
[ 9 20 80 69 79]
[ 0 1 0 99 88]
[ 0 0 0 19 14]
[ 0 0 0 9 57]
[32 31 74 23 35]]
m2
[[0.53981867 0.45483201 0.54514907 0.52098765 0.48836403]
[0.51811216 0.51670401 0.51427317 0.59014141 0.49073293]
[0.40278285 0.4512764 0.43444444 0.51642621 0.54402958]
[0.5344214 0.56448972 0.54737758 0.43269951 0.44870774]
[0.51248943 0.56465331 0.50648148 0.53021386 0.50906076]
[0.78923691 0.56633529 0.51641414 0.44336403 0.50210263]
[0.88137788 0.89779614 0.63552189 0.55070797 0.50367059]
[0.88888889 1. 0.75544508 0.75694003 0.67515605]
[0.43965976 0.48492221 0.37490287 0.48511085 0.38533625]
[0.30754918 0.32478065 0.27066895 0.46685032 0.548985 ]]
Maybe someone can contribute on the background of the lecture notes.
update: It is discussed in J. P. Lewis, “Fast Normalized Cross-Correlation”, Industrial Light and Magic. on the very first page.

How to parse animations.samplers.input from gltf

The spec explains the animations.samplers.input property as:
The index of an accessor containing keyframe input values, e.g., time. That accessor must have componentType FLOAT. The values represent time in seconds with time[0] >= 0.0, and strictly increasing values, i.e., time[n + 1] > time[n].
However, I'm having a bit of trouble understanding this from the first basic example on the demo repo, Animated Triangle
Specifically, if we bring the relevant binary data for the animation from animation.bin and decode it into a Float32Array, we get the following list of values:
[0, 0.25, 0.5, 0.75, 1, 0, 0, 0, 1, 0, 0, 0.7070000171661377, 0.7070000171661377, 0, 0, 1, 0, 0, 0, 0.7070000171661377, -0.7070000171661377, 0, 0, 0, 1]
This of course does not make sense in light of "strictly increasing values".
What am I misunderstanding here? How are these values meant to be used (in combination with output) in order to update the rotation over time?
Note that animation.bin is the view referenced from the input sampler. In other words, from the gltf
input == accessor 2
accessor 2 == bufferView 2
bufferView 2 == bytes(0-100) from buffer 1
buffer 1 == animation.bin
You've decoded too far. Although bufferView 2 is bytes 0 to 100, accessor 2 does not call for all those bytes. Here's accessor 2:
{
"bufferView" : 2,
"byteOffset" : 0,
"componentType" : 5126,
"count" : 5,
"type" : "SCALAR",
"max" : [ 1.0 ],
"min" : [ 0.0 ]
},
Note the count: 5 in there. Count is defined as:
The number of attributes referenced by this accessor, not to be confused with the number of bytes or number of components.
So, accesessor 2 is the first five SCALAR values from offset 0 in bufferView 2, namely the first five numbers from your decoded output above:
[0, 0.25, 0.5, 0.75, 1]
FWIW, there are tools to help investigate glTF binary files. Here's the "Peek Definition" function from VSCode's glTF extension:
(Disclaimer, I'm one of the authors of this extension, although I did not write this decode feature myself).

What is correct implementation of LDA (Linear Discriminant Analysis)?

I found that the result of LDA in OpenCV is different from other libraries. For example, the input data was
DATA (13 data samples with 4 dimensions)
7 26 6 60
1 29 15 52
11 56 8 20
11 31 8 47
7 52 6 33
11 55 9 22
3 71 17 6
1 31 22 44
2 54 18 22
21 47 4 26
1 40 23 34
11 66 9 12
10 68 8 12
LABEL
0 1 2 0 1 2 0 1 2 0 1 2 0
The OpenCV code is
Mat data = (Mat_<float>(13, 4) <<\
7, 26, 6, 60,\
1, 29, 15, 52,\
11, 56, 8, 20,\
11, 31, 8, 47,\
7, 52, 6, 33,\
11, 55, 9, 22,\
3, 71, 17, 6,\
1, 31, 22, 44,\
2, 54, 18, 22,\
21, 47, 4, 26,\
1, 40, 23, 34,\
11, 66, 9, 12,\
10, 68, 8, 12);
Mat mean;
reduce(data, mean, 0, CV_REDUCE_AVG);
mean.convertTo(mean, CV_64F);
Mat label(data.rows, 1, CV_32SC1);
for (int i=0; i<label.rows; i++)
label.at<int>(i) = i%3;
LDA lda(data, label);
Mat projection = lda.subspaceProject(lda.eigenvectors(), mean, data);
The matlab code is (used Matlab Toolbox for Dimensionality Reduction)
cd drtoolbox\techniques\
load hald
label=[0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0]
[projection, trainedlda] = lda(ingredients, label)
The eigenvalues are
OpenCV (lda.eigenvectors())
0.4457 4.0132
0.4880 3.5703
0.5448 3.3466
0.5162 3.5794
Matlab Toolbox for Dimensionality Reduction (trainedlda.M)
0.5613 0.7159
0.6257 0.6203
0.6898 0.5884
0.6635 0.6262
Then the projections of data are
OpenCV
1.3261 7.1276
0.8892 -4.7569
-1.8092 -6.1947
-0.0720 1.1927
0.0768 3.3105
-0.7200 0.7405
-0.3788 -4.7388
1.5490 -2.8255
-0.3166 -8.8295
-0.8259 9.8953
1.3239 -3.1406
-0.5140 4.2194
-0.5285 4.0001
Matlab Toolbox for Dimensionality Reduction
1.8030 1.3171
1.2128 -0.8311
-2.3390 -1.0790
-0.0686 0.3192
0.1583 0.5392
-0.9479 0.1414
-0.5238 -0.9722
1.9852 -0.4809
-0.4173 -1.6266
-1.1358 1.9009
1.6719 -0.5711
-0.6996 0.7034
-0.6993 0.6397
The eigenvectors and projections are different even though these LDAs have the same data. I believe there are 2 possibilities.
One of the libraries is wrong.
I am doing it wrong.
Thank you!
The difference is because eigenvectors are not normalized.
The normalized (L2 norm) eigenvectors are
OpenCV
0.44569 0.55196
0.48798 0.49105
0.54478 0.46028
0.51618 0.49230
Matlab Toolbox for Dimensionality Reduction
0.44064 0.55977
0.49120 0.48502
0.54152 0.46008
0.52087 0.48963
They look simliar now, although they have quite different eigenvalues.
Even though the PCA in OpenCV returns normalized eigenvectors, LDA does not. My next question is 'Is normalizing eigenvectors in LDA not necessary?'

Best way to convert bit offset to an integer [duplicate]

I have a 64-bit unsigned integer with exactly 1 bit set. I’d like to assign a value to each of the possible 64 values (in this case, the odd primes, so 0x1 corresponds to 3, 0x2 corresponds to 5, …, 0x8000000000000000 corresponds to 313).
It seems like the best way would be to convert 1 → 0, 2 → 1, 4 → 2, 8 → 3, …, 263 → 63 and look up the values in an array. But even if that’s so, I’m not sure what the fastest way to get at the binary exponent is. And there may be more efficient ways, still.
This operation will be used 1014 to 1016 times, so performance is a serious issue.
Finally an optimal solution. See the end of this section for what to do when the input is guaranteed to have exactly one non-zero bit: http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn
Here's the code:
static const int MultiplyDeBruijnBitPosition2[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition2[(uint32_t)(v * 0x077CB531U) >> 27];
You may be able to adapt this to a direct multiplication-based algorithm for 64-bit inputs; otherwise, simply add one conditional to see if the bit is in the upper 32 positions or the lower 32 positions, then use the 32-bit algorithm here.
Update: Here's at least one 64-bit version I just developed myself, but it uses division (actually modulo).
r = Table[v%67];
For each power of 2, v%67 has a distinct value, so just put your odd primes (or bit indices if you don't want the odd-prime thing) at the right positions in the table. 3 positions (0, 17, and 34) are not used, which might be convenient if you also want to accept all-bits-zero as an input.
Update 2: 64-bit version.
r = Table[(uint64_t)(val * 0x022fdd63cc95386dull) >> 58];
This is my original work, but I got the B(2,6) De Bruijn sequence from this chess site so I can't take credit for anything but figuring out what a De Bruijn sequence is and using Google. ;-)
Some additional remarks on how this works:
The magic number is a B(2,6) De Bruijn sequence. It has the property that, if you look at a 6-consecutive-bit window, you can obtain any six-bit value in that window by rotating the number appropriately, and that each possible six-bit value is obtained by exactly one rotation.
We fix the window in question to be the top 6 bit positions, and choose a De Bruijn sequence with 0's in the top 6 bits. This makes it so we never have to deal with bit rotations, only shifts, since 0's will come into the bottom bits naturally (and we could never end up looking at more than 5 bits from the bottom in the top-6-bits window).
Now, the input value of this function is a power of 2. So multiplying the De Bruijn sequence by the input value performs a bitshift by log2(value) bits. We now have in the upper 6 bits a number which uniquely determines how many bits we shifted by, and can use that as an index into a table to get the actual length of the shift.
This same approach can be used for arbitrarily-large or arbitrarily-small integers, as long as you're willing to implement the multiplication. You simply have to find a B(2,k) De Bruijn sequence where k is the number of bits. The chess wiki link I provided above has De Bruijn sequences for values of k ranging from 1 to 6, and some quick Googling shows there are a few papers on optimal algorithms for generating them in the general case.
If performance is a serious issue, then you should use intrinsics/builtins to use CPU specific instructions, such as the ones found here for GCC:
http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Other-Builtins.html
Built-in function int __builtin_ffs(unsigned int x).
Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero.
Built-in function int __builtin_clz(unsigned int x).
Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.
Built-in function int __builtin_ctz(unsigned int x).
Returns the number of trailing 0-bits in x, starting at the least significant bit position. If x is 0, the result is undefined.
Things like this are the core of many O(1) algorithms, such as kernel schedulers which need to find the first non-empty queue signified by an array of bits.
Note: I’ve listed the unsigned int versions, but GCC has unsigned long long versions, as well.
You could use a binary search technique:
int pos = 0;
if ((value & 0xffffffff) == 0) {
pos += 32;
value >>= 32;
}
if ((value & 0xffff) == 0) {
pos += 16;
value >>= 16;
}
if ((value & 0xff) == 0) {
pos += 8;
value >>= 8;
}
if ((value & 0xf) == 0) {
pos += 4;
value >>= 4;
}
if ((value & 0x3) == 0) {
pos += 2;
value >>= 2;
}
if ((value & 0x1) == 0) {
pos += 1;
}
This has the advantage over loops that the loop is already unrolled. However, if this is really performance critical, you will want to test and measure every proposed solution.
Some architectures (a suprising number, actually) have a single instruction that can do the calculation you want. On ARM it would be the CLZ (count leading zeroes) instruction. For intel, the BSF (bit-scan forward) or BSR (bit-scan reverse) instruction would help you out.
I guess this isn't really a C answer, but it will get you the speed you need!
precalculate 1 << i (for i = 0..63) and store them in an array
use a binary search to find the index into the array of a given value
look up the prime number in another array using this index
Compared to the other answer I posted here, this should only take 6 steps to find the index (as opposed to a maximum of 64). But it's not clear to me whether one step of this answer is not more time consuming than just bit shifting and incrementing a counter. You may want to try out both though.
See http://graphics.stanford.edu/~seander/bithacks.html - specifically "Finding integer log base 2 of an integer (aka the position of the highest bit set)" - for some alternative algorithsm. (If you're really serious about speed, you might consider ditching C if your CPU has a dedicated instruction).
Since speed, presumably not memory usage, is important, here's a crazy idea:
w1 = 1st 16 bits
w2 = 2nd 16 bits
w3 = 3rd 16 bits
w4 = 4th 16 bits
result = array1[w1] + array2[w2] + array3[w3] + array4[w4]
where array1..4 are sparsely populated 64K arrays that contain the actual prime values (and zero in the positions that don't correspond to bit positions)
#Rs solution is excellent this is just the 64 bit variant, with the table already calculated ...
static inline unsigned char bit_offset(unsigned long long self) {
static const unsigned char mapping[64] = {
[0]=0, [1]=1, [2]=2, [4]=3, [8]=4, [17]=5, [34]=6, [5]=7,
[11]=8, [23]=9, [47]=10, [31]=11, [63]=12, [62]=13, [61]=14, [59]=15,
[55]=16, [46]=17, [29]=18, [58]=19, [53]=20, [43]=21, [22]=22, [44]=23,
[24]=24, [49]=25, [35]=26, [7]=27, [15]=28, [30]=29, [60]=30, [57]=31,
[51]=32, [38]=33, [12]=34, [25]=35, [50]=36, [36]=37, [9]=38, [18]=39,
[37]=40, [10]=41, [21]=42, [42]=43, [20]=44, [41]=45, [19]=46, [39]=47,
[14]=48, [28]=49, [56]=50, [48]=51, [33]=52, [3]=53, [6]=54, [13]=55,
[27]=56, [54]=57, [45]=58, [26]=59, [52]=60, [40]=61, [16]=62, [32]=63
};
return mapping[((self & -self) * 0x022FDD63CC95386DULL) >> 58];
}
I built the table using the provided mask.
>>> ', '.join('[{0}]={1}'.format(((2**bit * 0x022fdd63cc95386d) % 2**64) >> 58, bit) for bit in xrange(64))
'[0]=0, [1]=1, [2]=2, [4]=3, [8]=4, [17]=5, [34]=6, [5]=7, [11]=8, [23]=9, [47]=10, [31]=11, [63]=12, [62]=13, [61]=14, [59]=15, [55]=16, [46]=17, [29]=18, [58]=19, [53]=20, [43]=21, [22]=22, [44]=23, [24]=24, [49]=25, [35]=26, [7]=27, [15]=28, [30]=29, [60]=30, [57]=31, [51]=32, [38]=33, [12]=34, [25]=35, [50]=36, [36]=37, [9]=38, [18]=39, [37]=40, [10]=41, [21]=42, [42]=43, [20]=44, [41]=45, [19]=46, [39]=47, [14]=48, [28]=49, [56]=50, [48]=51, [33]=52, [3]=53, [6]=54, [13]=55, [27]=56, [54]=57, [45]=58, [26]=59, [52]=60, [40]=61, [16]=62, [32]=63'
should the compiler complain:
>>> ', '.join(map(str, {((2**bit * 0x022fdd63cc95386d) % 2**64) >> 58: bit for bit in xrange(64)}.values()))
'0, 1, 2, 53, 3, 7, 54, 27, 4, 38, 41, 8, 34, 55, 48, 28, 62, 5, 39, 46, 44, 42, 22, 9, 24, 35, 59, 56, 49, 18, 29, 11, 63, 52, 6, 26, 37, 40, 33, 47, 61, 45, 43, 21, 23, 58, 17, 10, 51, 25, 36, 32, 60, 20, 57, 16, 50, 31, 19, 15, 30, 14, 13, 12'
^^^^ assumes that we iterate over sorted keys, this may not be the case in the future ...
unsigned char bit_offset(unsigned long long self) {
static const unsigned char table[64] = {
0, 1, 2, 53, 3, 7, 54, 27, 4, 38, 41, 8, 34, 55, 48,
28, 62, 5, 39, 46, 44, 42, 22, 9, 24, 35, 59, 56, 49,
18, 29, 11, 63, 52, 6, 26, 37, 40, 33, 47, 61, 45, 43,
21, 23, 58, 17, 10, 51, 25, 36, 32, 60, 20, 57, 16, 50,
31, 19, 15, 30, 14, 13, 12
};
return table[((self & -self) * 0x022FDD63CC95386DULL) >> 58];
}
simple test:
>>> table = {((2**bit * 0x022fdd63cc95386d) % 2**64) >> 58: bit for bit in xrange(64)}.values()
>>> assert all(i == table[(2**i * 0x022fdd63cc95386d % 2**64) >> 58] for i in xrange(64))
Short of using assembly or compiler-specific extensions to find the first/last bit that's set, the fastest algorithm is a binary search. First check if any of the first 32 bits are set. If so, check if any of the first 16 are set. If so, check if any of the first 8 are set. Etc. Your function to do this can directly return an odd prime at each leaf of the search, or it can return a bit index which you use as an array index into a table of odd primes.
Here's a loop implementation for the binary search, which the compiler could certainly unroll if that's deemed to be optimal:
uint32_t mask=0xffffffff;
int pos=0, shift=32, i;
for (i=6; i; i--) {
if (!(val&mask)) {
val>>=shift;
pos+=shift;
}
shift>>=1;
mask>>=shift;
}
val is assumed to be uint64_t, but to optimize this for 32-bit machines, you should special-case the first check, then perform the loop with a 32-bit val variable.
Call the GNU POSIX extension function ffsll, found in glibc. If the function isn't present, fall back on __builtin_ffsll. Both functions return the index + 1 of the first bit set, or zero. With Visual-C++, you can use _BitScanForward64.
unsigned bit_position = 0;
while ((value & 1) ==0)
{
++bit_position;
value >>= 1;
}
Then look up the primes based on bit_position as you say.
You may find that log(n) / log(2) gives you the 0, 1, 2, ... you're after in a reasonable timeframe. Otherwise, some form of hashtable based approach could be useful.
Another answer assuming IEEE float:
int get_bit_index(uint64_t val)
{
union { float f; uint32_t i; } u = { val };
return (u.i>>23)-127;
}
It works as specified for the input values you asked for (exactly 1 bit set) and also has useful behavior for other values (try to figure out exactly what that behavior is). No idea if it's fast or slow; that probably depends on your machine and compiler.
From the GnuChess source:
unsigned char leadz (BitBoard b)
/**************************************************************************
*
* Returns the leading bit in a bitboard. Leftmost bit is 0 and
* rightmost bit is 63. Thanks to Robert Hyatt for this algorithm.
*
***************************************************************************/
{
if (b >> 48) return lzArray[b >> 48];
if (b >> 32) return lzArray[b >> 32] + 16;
if (b >> 16) return lzArray[b >> 16] + 32;
return lzArray[b] + 48;
}
Here lzArray is a pregenerated array of size 2^16. This'll save you 50% of the operations compared to a full binary search.
This is for 32 bit, java, but it should be possible to adapt it to 64 bit.
It assume this will be the fastest cause there is no branching involved.
static public final int msb(int n) {
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
n >>>= 1;
n += 1;
return n;
}
static public final int msb_index(int n) {
final int[] multiply_de_bruijn_bit_position = {
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
return multiply_de_bruijn_bit_position[(msb(n) * 0x077CB531) >>> 27];
}
Here is more information from: http://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightMultLookup
// Count the consecutive zero bits (trailing) on the right with multiply and lookup
unsigned int v; // find the number of trailing zeros in 32-bit v
int r; // result goes here
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition[((uint32_t)((v & -v) * 0x077CB531U)) >> 27];
// Converting bit vectors to indices of set bits is an example use for this.
// It requires one more operation than the earlier one involving modulus
// division, but the multiply may be faster. The expression (v & -v) extracts
// the least significant 1 bit from v. The constant 0x077CB531UL is a de Bruijn
// sequence, which produces a unique pattern of bits into the high 5 bits for
// each possible bit position that it is multiplied against. When there are no
// bits set, it returns 0. More information can be found by reading the paper
// Using de Bruijn Sequences to Index 1 in a Computer Word by
// Charles E. Leiserson, Harald Prokof, and Keith H. Randall.
and as last:
http://supertech.csail.mit.edu/papers/debruijn.pdf

Resources