Stata: multiplying each variable of a set of time-series variables with the corresponding variable of another set - time-series

Being fairly new to Stata, I'm having a difficulty figuring out how to do the following:
I have time-series data on selling price (p) and quantity sold (q) for 10 products in a single datafile (i,e., 20 variables, p01-p10 and q01-q10). I am strugling with appropriate stata command that computes sales revenue (pq) time-series for each of these 10 products (i.e., pq01-pq10).
Many thanks for your help.

forval i = 1/10 {
local j : display %02.0f `i'
gen pq`j' = p`j' * q`j'
}
A standard loop over 1/10 won't get you the leading zero in 01/09. For that we need to use an appropriate format. See also
#article {pr0051,
author = "Cox, N. J.",
title = "Stata tip 85: Looping over nonintegers",
journal = "Stata Journal",
publisher = "Stata Press",
address = "College Station, TX",
volume = "10",
number = "1",
year = "2010",
pages = "160-163(4)",
url = "http://www.stata-journal.com/article.html?article=pr0051"
}
(added later) Another way to do it is
local j = string(`i', "%02.0f")
That makes it a bit more explicit that you are mapping from numbers 1,...,10 to strings "01",...,"10".

Related

/Lua/ How to do this (idk how to call that lol)

I need to make a trolleybus number, which won't repeat for game. For example, there is a number "101" and there musn't be more "101". How to do that? I have a code, but I know, he won't work and I won't test it lol
function giveNumber()
local number = math.random(100, 199)
local takedNumbers = {}
local i = 0
local massiv = i+1
script.Parent.pered.SurfaceGui.TextLabel.Text = number
script.Parent.zad.SurfaceGui.TextLabel.Text = number
script.Parent.levo.SurfaceGui.TextLabel.Text = number
script.Parent.pravo.SurfaceGui.TextLabel.Text = number
takedNumbers[massiv] = {number}
end
script.Parent.Script:giveNumber() // what I wrote here? idk...
if number == takedNumbers[massiv] then
giveNumber()
end
i didn't test it, because I think it won't work because this code is something bad
I think this will serve your needs.
In the function generateUniqueNumber, the script loops until it found a number that is not yet in the array. (in other words, that it hasn't given out yet)
Once it found that number, it will insert it into the table to remember that it has given it out, and then it will return the number.
Then on the bottom of the script we just give the numbers to the buses :-)
--[[
Goal: Give all buses a unique number
]]
-- Variables
local takenNumbers = {};
-- This function returns a random number in the range [100, 199] that has not been taken yet
function generateUniqueNumber()
local foundNumber = false;
while not foundNumber do
randomNumber = math.random(100, 199);
if not table.find(takenNumbers, randomNumber) then
table.insert(takenNumbers, randomNumber);
return randomNumber;
end
end
end
-- This function sets the number of the bus
script.Parent.pered.SurfaceGui.TextLabel.Text = tostring(generateUniqueNumber());
script.Parent.zad.SurfaceGui.TextLabel.Text = tostring(generateUniqueNumber());
script.Parent.levo.SurfaceGui.TextLabel.Text = tostring(generateUniqueNumber());
script.Parent.pravo.SurfaceGui.TextLabel.Text = tostring(generateUniqueNumber());
2 things:
I didn't test this code as Roblox is not installed on the pc I'm currently on.
Please try formatting your code nicely next time. It greatly improves the readability! For example, you can use this website:
https://codebeautify.org/lua-beautifier
Simpler
Fill a table with free numbers...
local freenumbers = {}
for i = 1, 99 do freenumbers[i] = i + 100 end
...for every new takennumbers use table.remove() on freenumbers
local takennumbers = {}
if #freenumbers > 0 then
takennumbers[#takennumbers + 1] = table.remove(freenumbers, math.random(1, #freenumbers))
end

Date with Gaps - Wavelet Analysis in R Using Biwavelet Package

I am performing Wavelet Analysis using biwavelet package in R. The date variable does not have continuous dates but with gaps. When I try to create the graph, I get the following error.
Error in check.datum(d) : The step size must be constant (see approx function to interpolate)
An MWE is given below:
library(foreign)
library(biwavelet)
library(xts)
library(labelled)
library(zoo)
date =c("2020-02-13", "2020-02-14", "2020-02-17", "2020-02-18", "2020-02-19", "2020-02-20", "2020-02-21", "2020-02-24", "2020-02-25", "2020-02-26", "2020-02-27", "2020-02-28", "2020-03-02", "2020-03-03", "2020-03-04", "2020-03-05", "2020-03-06", "2020-03-09", "2020-03-10", "2020-03-11", "2020-03-12", "2020-03-13")
rdate = as.Date(date)
date <- as.Date(date, format = "%Y-%m-%d")
date
class(date)
var = c(-0.077423148, -0.083293147, -0.089214072, -0.095185943, -0.101208754, -0.107282504, -0.113407195, -0.119582824, -0.125809386, -0.125806898, -0.132149309, -0.138584509, -0.145112529, -0.151733354, -0.158446968, -0.165253401, -0.172152638, -0.179144681, -0.186229542, -0.193407193, -0.200677648, -0.208040923)
data = data.frame(date, var)
View(data)
X <- as.xts(data[,-1], order.by = date)
ABC <- data.frame(date, var)
wt.t1=plot(wt(ABC), form = "%b-%d")
How can I resolve this issue?
You can interpolate missing days by following the instructions in the error message:
alldates <- seq(min(date), max(date), by = 1)
interpdata <- approx(date, var, xout = alldates)
ABC <- data.frame(date = alldates, var = interpdata$y)
wt.t1 <- plot(wt(ABC, form = "%b-%d")
However, I think the reason you are missing some days is that they are Saturday or Sunday; I only see weekdays in the dataset.
For many datasets (e.g. stock market trading, etc.) it doesn't make sense to interpolate "what would the price have been on Saturday?", because trades never occur on Saturday or Sunday. In that case, I'd suggest replacing the "date" variable with a simple increment, e.g.
date <- 1:length(date)
ABC <- data.frame(date, var)
wt.t1=plot(wt(ABC), form = "%b-%d")

Using a single pattern to capture multiple values containing in a file in lua script

i have a text file that contains data in the format YEAR, CITY, COUNTRY. data is written as one YEAR, CITY, COUNTRY per line. eg -:
1896, Athens, Greece
1900, Paris, France
Previously i was using the data hard coded like this
local data = {}
data[1] = { year = 1896, city = "Athens", country = "Greece" }
data[2] = { year = 1900, city = "Paris", country = "France" }
data[3] = { year = 1904, city = "St Louis", country = "USA" }
data[4] = { year = 1908, city = "London", country = "UK" }
data[5] = { year = 1912, city = "Stockholm", country = "Sweden" }
data[6] = { year = 1920, city = "Antwerp", country = "Netherlands" }
Now i need to read the lines from the file and get the values in to the private knowledge base "local data = {} "
Cant figure out how to capture multiple values using a single pattern from the data in the file.
My code so far is
local path = system.pathForFile( "olympicData.txt", system.ResourceDirectory )
-- Open the file handle
local file, errorString = io.open( path, "r" )
if not file then
-- Error occurred; output the cause
print( "File error: " .. errorString )
else
-- Read each line of the file
for line in file:lines() do
local i, value = line:match("%d")
table.insert(data, i)
-- Close the file
io.close(file)
end
file = nil
Given that you read a line like
1896, Athens, Greece
You can simply obtain the desired values using captures.
https://www.lua.org/manual/5.3/manual.html#6.4.1
Captures: A pattern can contain sub-patterns enclosed in parentheses; they describe captures. When a match succeeds, the
substrings of the subject string that match captures are stored
(captured) for future use. Captures are numbered according to their
left parentheses. For instance, in the pattern "(a*(.)%w(%s*))", the
part of the string matching "a*(.)%w(%s*)" is stored as the first
capture (and therefore has number 1); the character matching "." is
captured with number 2, and the part matching "%s*" has number 3.
As a special case, the empty capture () captures the current string
position (a number). For instance, if we apply the pattern "()aa()" on
the string "flaaap", there will be two captures: 3 and 5.
local example = "1896, Athens, Greece"
local year, city, country = example:match("(%d+), (%w+), (%w+)")
print(year, city, country)

How to setup the correct logic for picking a random item from a list based on item's rarity i.e "rare" "normal"

I'm writing a game using Corona SDK in lua language. I'm having a hard time coming up with a logic for a system like this;
I have different items. I want some items to have 1/1000 chance of being chosen (a unique item), I want some to have 1/10, some 2/10 etc.
I was thinking of populating a table and picking a random item. For example I'd add 100 of "X" item to the table and than 1 "Y" item. So by choosing randomly from [0,101] I kind of achieve what I want but I was wondering if there were any other ways of doing it.
items = {
Cat = { probability = 100/1000 }, -- i.e. 1/10
Dog = { probability = 200/1000 }, -- i.e. 2/10
Ant = { probability = 699/1000 },
Unicorn = { probability = 1/1000 },
}
function getRandomItem()
local p = math.random()
local cumulativeProbability = 0
for name, item in pairs(items) do
cumulativeProbability = cumulativeProbability + item.probability
if p <= cumulativeProbability then
return name, item
end
end
end
You want the probabilities to add up to 1. So if you increase the probability of an item (or add an item), you'll want to subtract from other items. That's why I wrote 1/10 as 100/1000: it's easier to see how things are distributed and to update them when you have a common denominator.
You can confirm you're getting the distribution you expect like this:
local count = { }
local iterations = 1000000
for i=1,iterations do
local name = getRandomItem()
count[name] = (count[name] or 0) + 1
end
for name, count in pairs(count) do
print(name, count/iterations)
end
I believe this answer is a lot easier to work with - albeit slightly slower in execution.
local chancesTbl = {
-- You can fill these with any non-negative integer you want
-- No need to make sure they sum up to anything specific
["a"] = 2,
["b"] = 1,
["c"] = 3
}
local function GetWeightedRandomKey()
local sum = 0
for _, chance in pairs(chancesTbl) do
sum = sum + chance
end
local rand = math.random(sum)
local winningKey
for key, chance in pairs(chancesTbl) do
winningKey = key
rand = rand - chance
if rand <= 0 then break end
end
return winningKey
end

Calculating vector distance for classification with mixed features

I'm doing a project comparing the effectiveness of various classification algorithms, but I'm stuck on a frustrating point. The data may be found here: http://archive.ics.uci.edu/ml/datasets/Adult The classification problem is whether or not a person makes over 50k a year based on their census data.
Two example entries are as follows:
45, Private, 98092, HS-grad, 9, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 60, United-States, <=50K
50, Self-emp-not-inc, 386397, Bachelors, 13, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 60, United-States, <=50K
I'm familiar with using Euclidean distance to calculate the difference between vectors, but I'm not sure how to work with a mix of continuous and discrete attributes. Are there any effective methods for representing the difference between two vectors in a meaningful way? I'm having a hard time wrapping my head around how large values like the third attribute (a weight calculated by the people who extracted the data set based on factors, so that similar weights should have similar attributes) and differences between it can preserve meaning from discrete features like male or female, which is only a Euclidean distance of 1 if I understand the method correctly. I'm sure some categories could be removed, but I don't want to remove something that factors into classification significantly. I'm tackling k-NN first once I get this figured out, then a Bayesian classifier, and finally a decision tree model like C4.5 or ID3 if I have the time.
Sure, you can extend Euclidean distance in any number of ways. The simplest extension would be the following rule:
distance = 0 in that coordinate if there's a match, 1 otherwise
The challenge will be making the concept of distance "relevant" for the k-NN follow up. In some cases (e.g. education), I think it will be best to map education (discrete variable) into a continuous variable, such as years of education. So you'll need to write a function which maps e.g. "HS-grad" to 12, "Bachelors" to 16, something like that.
Beyond that, using k-NN directly isn't going to work because the idea of "distance" among multiple dis-similar dimensions isn't well defined. I think you'll be better off throwing some of these dimensions away or weighting them differently. I don't know what the third number in your dataset (e.g. 98092) means, but if you use naive Euclidean distance this would be extremely overweighted compared to other dimensions such as age.
I'm not a machine learning expert, but I would personally be tempted to start k-NN on a reduced dimensionality dataset where you just pick some broad demographics (e.g. age, education, marital status) and ignore the trickier/"noisier" categories.
You need to code your categorical variables as 1-of-n binary variables (n choices for the variable, and of those variables one and only one is active). Then standardise your features---for each feature, subtract its mean and divide by standard deviation. Or normalise into the range 0-1. It's not perfect, but this will at least make dimensions comparable.
Create individual Maps for each data points and use the map to convert to a double value.
def createMap(data: RDD[String]) : Map[String,Double] = {
var mapData:Map[String,Double] = Map()
var counter = 0.0
data.collect().foreach{ item =>
counter = counter +1
mapData += (item -> counter)
}
mapData
}
def getLablelValue(input: String): Int = input match {
case "<=50K" => 0
case ">50K" => 1
}
val census = sc.textFile("/user/cloudera/census_data.txt")
val orgTypeRdd = census.map(line => line.split(", ")(1)).distinct
val gradeTypeRdd = census.map(line => line.split(", ")(3)).distinct
val marStatusRdd = census.map(line => line.split(", ")(5)).distinct
val jobTypeRdd = census.map(line => line.split(", ")(6)).distinct
val familyStatusRdd = census.map(line => line.split(", ")(7)).distinct
val raceTypeRdd = census.map(line => line.split(", ")(8)).distinct
val genderTypeRdd = census.map(line => line.split(", ")(9)).distinct
val countryRdd = census.map(line => line.split(", ")(13)).distinct
val salaryRange = census.map(line => line.split(", ")(14)).distinct
val orgTypeMap = createMap(orgTypeRdd)
val gradeTypeMap = createMap(gradeTypeRdd)
val marStatusMap = createMap(marStatusRdd)
val jobTypeMap = createMap(jobTypeRdd)
val familyStatusMap = createMap(familyStatusRdd)
val raceTypeMap = createMap(raceTypeRdd)
val genderTypeMap = createMap(genderTypeRdd)
val countryMap = createMap(countryRdd)
val salaryRangeMap = createMap(salaryRange)
val featureVector = census.map{line =>
val fields = line.split(", ")
LabeledPoint(getLablelValue(fields(14).toString) , Vectors.dense(fields(0).toDouble, orgTypeMap(fields(1).toString) , fields(2).toDouble , gradeTypeMap(fields(3).toString) , fields(4).toDouble , marStatusMap(fields(5).toString), jobTypeMap(fields(6).toString), familyStatusMap(fields(7).toString),raceTypeMap(fields(8).toString),genderTypeMap (fields(9).toString), fields(10).toDouble , fields(11).toDouble , fields(12).toDouble,countryMap(fields(13).toString) , salaryRangeMap(fields(14).toString)))
}

Resources