Histogram of a value for a variable over unique values of another variable - histogram

I have a dataset in Stata with 203 rows and 2 columns. Here is some rows:
Voting Bidvalue
0 720
1 15
0 120
0 960
1 30
1 400
0 60
0 960
0 240
There are eight different bid values including 15, 30, 60, 120, 240, 360, 480, 720. For each bid value, we can find the number of 1 and 0 for the Voting column. Here is the screenshot of the detailed information.
I want to find a histogram that the x-axis is 8 bid values and the y-axis is the number of 1 for the Voting column.

histogram Bidvalue if Voting == 1 , freq
is an answer to the question. It is perhaps more likely that you want something more like
egen Toshow = group(Bidvalue), label
label var Toshow "Bidvalue"
histogram Toshow if Voting == 1, xla(1/8, valuelabel) discrete freq

The following works for me:
graph bar (count) Voting, over(Bidvalue)

Related

256 possible values in a 8 bits

I am confused when I read the details section that says 1 byte which is 8 bits gives us a potential of 2^8 or 256 possible values. (https://en.wikipedia.org/wiki/8-bit_computing)
If i am doing the math correctly
2^0 = 1
2^1 = 2
2^2 = 4
2^3 = 8
2^4 = 16
2^5 = 32
2^6 = 64
2^7 = 128
Total = 255
The way i see it there is total or possible 255 values.
0 is also a value so for 8 bits, the value range is 0-255.
00000000 is the lowest and 11111111 (255) is the highest.
2^x gives you the total number of possible values for x bits. You should be using 2^x to get the number of possible combinations only where x > 0. If x = 0, it points to a no-bit scenario which is irrelevant.
For your case, it is not correct to sum values from 2^0 to 2^7. The correct approach should be just calculating 2^8, which is 256.

torch7: Filtering out NaN values

Given any general float torch.Tensor, possibly containing some NaN values, I am looking for an efficient method to either replace all the NaN values in it with zero, or remove them altogether and filter out the "useful" values in another new Tensor.
I am aware that a trivial way to do this is to manually iterate through all the values in the given tensor (and correspondingly replace them with zero or reject them for the new tensor).
Is there some pre-defined Torch function or a combination of functions which can achieve this more efficiently in terms of performance, which relies on the inherent CPU-GPU optimisations of Torch?
Well, it looks like there is no function in torch checking tensor for NaNs. But since NaN != NaN, there's a work around:
a = torch.rand(4, 5)
a[2][3] = tonumber('nan')
nan_mask = a:ne(a)
notnan_mask = a:eq(a)
print(a)
0.2434 0.1731 0.3440 0.3340 0.0519
0.0932 0.4067 nan 0.1827 0.5945
0.3020 0.1035 0.5415 0.3329 0.7881
0.6108 0.9498 0.0406 0.9335 0.3582
[torch.DoubleTensor of size 4x5]
print(nan_mask)
0 0 0 0 0
0 0 1 0 0
0 0 0 0 0
0 0 0 0 0
[torch.ByteTensor of size 4x5]
Having these masks, you can efficiently extract NaN/not NaN values and replace them with whatever you want:
print(a[notnan_mask])
...
[torch.DoubleTensor of size 19]
a[nan_mask] = 42
print(a)
0.2434 0.1731 0.3440 0.3340 0.0519
0.0932 0.4067 42.0000 0.1827 0.5945
0.3020 0.1035 0.5415 0.3329 0.7881
0.6108 0.9498 0.0406 0.9335 0.3582
[torch.DoubleTensor of size 4x5]

Is there a way to use Machine Learning classify discrete and infinite scale data?

The data like that:
x y
7773 0
9805 4
7145 0
7645 1
2529 1
4814 2
6027 2
7499 2
3367 1
8861 5
9776 2
8009 5
3844 2
1218 2
1120 1
4553 0
3017 1
2582 2
1691 2
5342 0
...
The real function f(x) is: (Return the circle count of a decimal integer)
# 0 1 2 3 4 5 6 7 8 9
_f_map = [1, 0, 0, 0, 0, 0, 1, 0, 2, 1]
def f(x):
x = int(x)
assert x >= 0
if x == 0:
return 1
r = 0
while x:
r += _f_map[x % 10]
x /= 10
return r
The training data and test data can be produced by random:
data = []
target = []
for i in xrange(3000):
x = random.randint(0, 999999) #hardcode a scale
data.append([x])
target.append(f(x))
The real function is discrete and infinite scale.
Is there a way or a model can classify this data?
I tried SVM(Support Vector Machine), and acquired a 20% accuracy rate.
Looks like a typical use case of sequential models. You can easily learn LSTM/ other recurrent neural network to do so by considering your numbers as sequences of integers feeded to the network. At this point it just has to learn sum operation and a simple mapping(your f_map).

generating series of number 0,3,5,8,10,13,15,18

i want to generate a series of number through looping.
my series will contain numbers like 0,3,5,8,10,13,15,18 and so on.
i try to take reminder and try to add 2 and 3 but it wont work out.
can any one please help me in generating this series.
You can just use an increment which toggles between 3 and 2, e.g.
for (i = 0, inc = 3; i < 1000; i += inc, inc = 5 - inc)
{
printf("%d\n", i);
}
It looks like the the sequence starts at zero, and uses increments of 3 and 2. There are several ways of implementing this, but perhaps the simplest one would be iterating in increments of 5 (i.e. 3+2) and printing two numbers - position and position plus three.
Here is some pseudocode:
i = 0
REPEAT N times :
PRINT i
PRINT i + 3
i += 5
The iteration i=0 will print 0 and 3
The iteration i=5 will print 5 and 8
The iteration i=10 will print 10 and 13
The iteration i=15 will print 15 and 18
... and so on
I was pulled in with the tag generate-series, which is a powerful PostgreSQL function. This may have been tagged by mistake (?) but it just so happens that there would be an elegant solution:
SELECT ceil(generate_series(0, 1000, 25) / 10.0)::int;
generate_series() returns 0, 25, 50, 75 , ... (can only produces integer numbers)
division by 10.0 produces numeric data: 0, 2.5, 5, 7.5, ...
ceil() rounds up to your desired result.
The final cast to integer (::int) is optional.
SQL Fiddle.

How do I find out the longest run of a number?

This seemed like a trivial question to me, but I cannot get it done correctly. Part of my dataset looks like this
1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0
and contains two “runs” of 1 (not sure if that’s the correct word), one with a length 3, the other with a length of 5.
How can I use Google Docs or similar spreadsheet applications to find the longest of those runs?
In Excel you can use a single formula to get the maximum number of consecutive 1s, i.e.
=MAX(FREQUENCY(IF(A2:A100=1,ROW(A2:A100)),IF(A2:A100<>1,ROW(A2:A100))))
confirmed with CTRL+SHIFT+ENTER
In Google Sheets you can use the same formula but wrap in arrayformula rather than use CSE, i.e.
=arrayformula(MAX(FREQUENCY(IF(A2:A100=1,ROW(A2:A100)),IF(A2:A100<>1,ROW(A2:A100)))))
Assumes data in A2:A100 without blanks
EDIT: whuber's suggestion is just too simple for me to not update this response. One can just use a simple IF statement checking if the current row is equal to 1. If it is, it starts a counter (the prior row + 1), if it is not it starts the counter again at 0.
You just need to initialize the first row of B1 to 1 or 0. Using the dynamic updating of cell formulas once you have it written once it fills in the rest.
So you would start out;
A B
1 1
1 =IF(A2=1, B1+1, 0)
1
0
0
1
1
1
1
0
0
0
Then fill in;
A B
1 1
1 =IF(A2=1, B1+1, 0)
1 =IF(A3=1, B2+1, 0)
0 =IF(A4=1, B3+1, 0)
0 =IF(A5=1, B4+1, 0)
1 =IF(A6=1, B5+1, 0)
1 =IF(A7=1, B6+1, 0)
1 =IF(A8=1, B7+1, 0)
1 =IF(A9=1, B8+1, 0)
0 =IF(A10=1, B9+1, 0)
0 =IF(A11=1, B10+1, 0)
0 =IF(A12=1, B11+1, 0)
And here the result in column B is;
A B
1 1
1 2
1 3
0 0
0 0
1 1
1 2
1 3
1 4
0 0
0 0
0 0
Hopefully the logic is extendable to Google Docs.

Resources