VIN Decoding: Model Year - decoder

I'm trying to code a simple VIN (ISO 3779) decoder: manufacturer and model year. I'm having some issues w/ decoding the model year, though. According to Wikipedia:
For passenger cars, and for multipurpose passenger vehicles and trucks with a gross vehicle weight rating of 10,000 lb (4,500 kg) or less, if position 7 is numeric, the model year in position 10 of the VIN refers to a year in the range 1980–2009. If position 7 is alphabetic, the model year in position 10 of VIN refers to a year in the range 2010–2039.
My car's VIN (Model Year 2012) has the following info:
VSS---1--C-------
12345678901234567
Manufacturer: SEAT, Model Year: 1982 (Some online VIN decoders give me 1982, some others give me 2012)
How can I modify my decoder so I get this right, other than doing a nasty if (Manufacturer == "SEAT") Year += 30;hack?

Having read positions 7 and 10, here's some PHP code:
$year = date_1980_2009( $position_10 ); # use your current date function...
if ( preg_match( "/^[A-Z]$/i", $position_7 ) ) $year += 30; # add 30 years if 7 is alphabetic
Having said that, your car doesn't seem to be following the rules. Exceptional cases require coding exceptions -- which aren't hacks. Sorry.

Related

How do I get the first two digits from a number?

I need to make a code that tells you the century when you give the year. I have this:
local kata = {}
function kata.century(number)
if number%100 == 0 then >I need to get the first two numbers
return
else
return number/100 + 1
end
end
return kata
I basically need a line that gives me the first two numbers of the year for years like "1700" and "2000"
so I can divide them by 100 and add 1.
(i'm a beginner btw)
In Lua 5.3+, use number//100.
For earlier versions, use math.floor(number/100).
According to the Gregorian calendar, 1 CE was the first year of the 1st Century CE. Since a century is a period of 100 years, this means that the first year of any century in the common era ends with a 1; thus 2000 was the last year of the 20th Century, and 2001 was the first year of the 21st Century.
Finding the century from the first two digits of the year alone will not work for this strictly correct method of identification. Taking the first two digits of 2000, and adding 1 would yield the 21st Century. But, instead of using math.floor to truncate the result of division by 10, one can use math.ceil to get the smallest integer greater than the result of the division.
function century (year)
return math.ceil(year / 100)
end
This century function gives the correct century given a year in the common era:
> century(1)
1
> century(100)
1
> century(101)
2
> century(2000)
20
> century(2001)
21
There is a convention in popular usage that centuries should be numbered based on shared digits instead of the Gregorian calendar. In this usage all years beginning with 20 are in the 21st Century, making 2000 the first year of the 21st Century. Since there is no year 0 in the Gregorian calendar, this means that the 1st Century (from 1 CE to 99 CE under this convention) spans 99 years, but all other centuries in the common era span 100 years (e.g., 100 CE to 199 CE). Finding the century from the year using this convention can be done by dividing the year by 100 and taking the floor of the result.
If the goal is to match popular expectations and follow the general popular misunderstanding of numbering centuries, use the floor method. But, if the goal is to get correct and consistent numbering of centuries based on the Gregorian calendar, use the ceiling method.

How to use foreach / forv to replace duplicates in an increasing order

I have a "raw" data set that I´m trying to clean. The data set consists of individuals with the variable age between year 2000 and 2010. There are around 20000 individuals in the data set with the same problem.
The variable age is not increasing in the years 2004-2006. For example, for one individual it looks like this:
2000: 16,
2001: 17,
2002: 18,
2003: 19,
2004: 19,
2005: 19,
2006: 19,
2007: 23,
2008: 24,
2009: 25,
2010: 26,
So far I have tried to generate variables for the max age and max year:
bysort id: egen last_year=max(year)
bysort id: egen last_age=max(age)
and then use foreach combined with lags to try to replace age variable in decreasing order so that when the new variable last_age (that now are 26 in all years) rather looks like this:
2010: 26
2009: 25 (26-1)
2008: 24 (26-2) , and so on.
However, I have some problem with finding the correct code for this problem.
Assuming that for each individual the first value of age is not missing and is correct, something like this might work
bysort id (year): replace age = age[1]+(year-year[1])
Alternatively, if the last value of age is assumed to always be accurate,
bysort id (year): replace age = age[_N]-(year[_N]-year)
Or, just fix the ages where there is no observation-to-observation change in age
bysort id (year): replace age = age[_n-1]+(year-year[_n-1]) if _n>1 & age==age[_n-1]
In the absence of sample data none of these have been tested.
William's code is very much to the point, but a few extra remarks won't fit easily into a comment.
Suppose we have age already and generate two other estimates going forward and backward as he suggests:
bysort id (year): gen age2 = age[1] + (year - year[1])
bysort id (year): gen age3 = age[_N] - (year[_N] - year)
Now if all three agree, we are good, and if two out of three agree, we will probably use the majority vote. Either way, that is the median; the median will be, for 3 values, the sum MINUS the minimum MINUS the maximum.
gen median = (age + age2 + age3) - max(age, age2, age3) - min(age, age2, age3)
If we get three different estimates, we should look more carefully.
edit age* if max(age, age2, age3) > median & median > min(age, age2, age3)
A final test is whether medians increase in the same way as years:
bysort id (year) : assert (median - median[_n-1]) == (year - year[_n-1]) if _n > 1

why pytz.country_timezones('cn') in centos system have different result?

Two computer install centos 6.5, kernel is 3.10.44, have different result.
one result is [u'Asia/Shanghai', u'Asia/Urumqi'], and the other is ['Asia/Shanghai', 'Asia/Harbin', 'Asia/Chongqing', 'Asia/Urumqi', 'Asia/Kashgar'].
Is there any config that make the first result same as the second result?
I have following python code:
def get_date():
date = datetime.utcnow()
from_zone = pytz.timezone("UTC")
to_zone = pytz.timezone("Asia/Urumqi")
date = from_zone.localize(date)
date = date.astimezone(to_zone)
return date
def get_curr_time_stamp():
date = get_date()
stamp = time.mktime(date.timetuple())
return stamp
cur_time = get_curr_time_stamp()
print "1", time.strftime("%Y %m %d %H:%M:%S", time.localtime(time.time()))
print "2", time.strftime("%Y %m %d %H:%M:%S", time.localtime(cur_time))
When use this code to get time, the result of one computer(have 2 results) is:
1 2016 04 20 08:53:18
2 2016 04 20 06:53:18
and the other(have 5 results) is:
1 2016 04 20 08:53:18
2 2016 04 20 08:53:18
I don't know why?
You probably just have an outdated version of pytz on the system returning five time zones (or perhaps on both systems). You can find the latest releases here. It's important to stay on top of time zone updates, as the various governments of the world change their time zones often.
Like most systems, pytz gets its data from the tz database. The five time zones for China were reduced to two in version 2014f (corresponding to pytz 2014.6). From the release notes:
China's five zones have been simplified to two, since the post-1970
differences in the other three seem to have been imaginary. The
zones Asia/Harbin, Asia/Chongqing, and Asia/Kashgar have been
removed; backwards-compatibility links still work, albeit with
different behaviors for time stamps before May 1980. Asia/Urumqi's
1980 transition to UTC+8 has been removed, so that it is now at
UTC+6 and not UTC+8. (Thanks to Luther Ma and to Alois Treindl;
Treindl sent helpful translations of two papers by Guo Qingsheng.)
Also, you may wish to read Wikipedia's Time in China article, which explains that the Asia/Urumqui entry is for "Ürümqi Time", which is used unofficially in some parts of the Xinjiang region. This zone is not recognized by the Chinese government, and is considered a politically charged issue. As such, many systems choose to omit the Urumqi time zone, despite it being in listed in the tz database.

Generating means of a variable using dummy variables & foreach in Stata

My dataset includes TWO main variables X and Y.
Variable X represents distinct codes (e.g. 001X01, 001X02, etc) for multiple computer items with different brands.
Variable Y represents the tax charged for each code of variable X (e.g. 15 = 15% for 001X01) at a store.
I've created categories for these computer items using dummy variables (e.g. HD dummy variable for Hard-Drives, takes value of 1 when variable X represents a HD, etc). I have a list of over 40 variables (two of them representing X and Y, and the rest is a bunch of dummy variables for the different categories I've created for computer items).
I would like to display the averages of all these categories using a loop in Stata, but I'm not sure how to do this.
For example the code:
mean Y if HD == 1
Mean estimation Number of obs = 5
--------------------------------------------------------------
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
Tax | 7.1 2.537716 1.154172 15.24583
gives me the mean Tax for the category representing Hard Drives. How can I use a loop in Stata to automatically display all the mean Taxes charged for each category? I would do it by hand without a problem, but I want to repeat this process for multiple years, so I would like to use a loop for each year in order to come up with this output.
My goal is to create a separate Excel file with each of the computer categories I've created (38 total) and the average tax for each category by year.
Why bother with the loop and creating the indicator variables? If I understand correctly, your initial dataset allows the use of a simple collapse:
clear all
set more off
input ///
code tax str10 categ
1 0.15 "hd"
2 0.25 "pend"
3 0.23 "mouse"
4 0.29 "pend"
5 0.16 "pend"
6 0.50 "hd"
7 0.54 "monitor"
8 0.22 "monitor"
9 0.21 "mouse"
10 0.76 "mouse"
end
list
collapse (mean) tax, by(categ)
list
To take to Excel you can try export excel or put excel.
Run help collapse and help export for details.
Edit
Because you insist, below is an example that gives the same result using loops.
I assume the same data input as before. Some testing using this example database
with expand 1000000, shows that speed is virtually the same. But almost surely,
you (including your future you) and your readers will prefer collapse.
It is much clearer, cleaner and concise. It is even prettier.
levelsof categ, local(parts)
gen mtax = .
quietly {
foreach part of local parts {
summarize tax if categ == "`part'", meanonly
replace mtax = r(mean) if categ == "`part'"
}
}
bysort categ: keep if _n == 1
keep categ mtax
Stata has features that make it quite different from other languages. Once you
start getting a hold of it, you will find that many things done with loops elsewhere,
can be made loop-less in Stata. In many cases, the latter style will be preferred.
See corresponding help files using help <command> and if you are not familiarized with saved results (e.g. r(mean)), type help return.
A supplement to Roberto's excellent answer: After collapse, you will need a loop to export the results to excel.
levelsof categ, local(levels)
foreach x of local levels {
export excel `x', replace
}
I prefer to use numerical codes for variables such as your category variable. I then assign them value labels. Here's a version of Roberto's code which does this and which, for closer correspondence to your problem, adds a "year" variable
input code tax categ year
1 0.15 1 1999
2 0.25 2 2000
3 0.23 3 2013
4 0.29 1 2010
5 0.16 2 2000
6 0.50 1 2011
7 0.54 4 2000
8 0.22 4 2003
9 0.21 3 2004
10 0.76 3 2005
end
#delim ;
label define catl
1 hd
2 pend
3 mouse
4 monitor
;
#delim cr
label values categ catl
collapse (mean) tax, by(categ year)
levelsof categ, local(levels)
foreach x of local levels {
export excel `:label (categ) `x'', replace
}
The #delim ; command makes it possible to easily list each code on a separate line. The"label" function in the export statement is an extended macro function to insert a value label into the file name.

How to find if range is contained in an array of ranges?

Example
business_hours['monday'] = [800..1200, 1300..1700]
business_hours['tuesday'] = [900..1100, 1300..1700]
...
I then have a bunch of events which occupy some of these intervals, for example
event = { start_at: somedatetime, end_at: somedatetime }
Iterating over events from a certain date to a certain date, I create another array
busy_hours['monday'] = [800..830, 1400..1415]
...
Now my challenges are
Creating an available_hours array that contains business_hours minus busy_hours
available_hours = business_hours - busy_hours
Given a certain duration say 30 minutes, find which time slots are available in available_hours. In the examples above, such a method would return
available_slots['monday'] = [830..900, 845..915, 900..930, and so on]
Not that it checks available_hours in increments of 15 minutes for slots of specified duration.
Thanks for the help!
I think this is a job for bit fields. Unfortunately this solution will rely on magic numbers, conversions helpers and a fair bit of binary logic, so it won't be pretty. But it will work and be very efficient.
This is how I'd approach the problem:
Atomize your days into reasonable time intervals. I'll follow your example and treat each 15 minute block of time as considered one time chunk (mostly because it keeps the example simple). Then represent your availability per hour as a hex digit.
Example:
0xF = 0x1111 => available for the whole hour.
0xC = 0x1100 => available for the first half of the hour.
String 24 of these together together to represent a day. Or fewer if you can be sure that no events will occur outside of the range. The example continues assuming 24 hours.
From this point on I've split long Hex numbers into words for legibility
Assuming the day goes from 00:00 to 23:59 business_hours['monday'] = 0x0000 0000 FFFF 0FFF F000 0000
To get busy_hours you store events in a similar format, and just & them all together.
Exmample:
event_a = 0x0000 0000 00F0 0000 0000 0000 # 10:00 - 11:00
event_b = 0x0000 0000 0000 07F8 0000 0000 # 13:15 - 15:15
busy_hours = event_a & event_b
From busy_hours and business_hours you can get available hours:
available_hours = business_hours & (busy_hours ^ 0xFFFF FFFF FFFF FFFF FFFF FFFF)
The xor(^) essentialy translates busy_hours into not_busy_hours. Anding (&) not_busy_hours with business_hours gives us the available times for the day.
This scheme also makes it simple to compare available hours for many people.
all_available_hours = person_a_available_hours & person_b_available_hours & person_c_available_hours
Then to find a time slot that fits into available hours. You need to do something like this:
Convert your length of time into a similar hex digit to the an hour where the ones represent all time chunks of that hour the time slot will cover. Next right shift the digit so there's no trailing 0's.
Examples are better than explanations:
0x1 => 15 minutes, 0x3 => half hour, 0x7 => 45 minutes, 0xF => full hour, ... 0xFF => 2 hours, etc.
Once you've done that you do this:
acceptable_times =[]
(0 .. 24 * 4 - (#of time chunks time slot)).each do |i|
acceptable_times.unshift(time_slot_in_hex) if available_hours & (time_slot_in_hex << i) == time_slot_in_hex << i
end
The high end of the range is a bit of a mess. So lets look a bit more at it. We don't want to shift too many times or else we'll could start getting false positives at the early end of the spectrum.
24 * 4 24 hours in the day, with each represented by 4 bits.
- (#of time chunks in time slot) Subtract 1 check for each 15 minutes in the time slot we're looking for. This value can be found by (Math.log(time_slot_in_hex)/Math.log(2)).floor + 1
Which starts at the end of the day, checking each time slot, moving earlier by a time chunk (15 minutes in this example) on each iteration. If the time slot is available it's added to the start of acceptable times. So when the process finishes acceptable_times is sorted in order of occurrence.
The cool thing is this implementation allows for time slots that incorporate so that your attendee can have a busy period in their day that bisects the time slot you're looking for with a break, where they might be otherwise busy.
It's up to you to write helper functions that translate between an array of ranges (ie: [800..1200, 1300..1700]) and the hex representation. The best way to do that is to encapsulate the behaviour in an object and use custom accessor methods. And then use the same objects to represent days, events, busy hours, etc. The only thing that's not built into this scheme is how to schedule events so that they can span the boundary of days.
To answer your question's title, find if a range of arrays contains a range:
ary = [800..1200, 1300..1700]
test = 800..830
p ary.any? {|rng| rng.include?(test.first) and rng.include?(test.last)}
# => true
test = 1245..1330
p ary.any? {|rng| rng.include?(test.first) and rng.include?(test.last)}
# => false
which could be written as
class Range
def include_range?(r)
self.include?(r.first) and self.include?(r.last)
end
end
Okay, I don't have time to write up a full solution, but the problem does not seem too difficult to me. I hacked together the following primitive methods you can use to help in constructing your solution (You may want to subclass Range rather than monkey patching, but this will give you the idea):
class Range
def contains(range)
first <= range.first || last >= range.last
end
def -(range)
out = []
unless range.first <= first && range.last >= last
out << Range.new(first, range.first) if range.first > first
out << Range.new(range.last, last) if range.last < last
end
out
end
end
You can iterate over business hours and find the one that contains the event like so:
event_range = event.start_time..event.end_time
matching_range = business_hours.find{|r| r.contains(event_range)}
You can construct the new array like this (pseudocode, not tested):
available_hours = business_hours.dup
available_hours.delete(matching_range)
available_hours += matching_range - event_range
That should be a pretty reusable approach. Of course you'll need something totally different for the next part of your question, but this is all I have time for :)

Resources