I'm using Searchkick in a Rails project with an ElasticSearch 6.8 server. I'm trying to boost certain documents that have a year field that's equal to this year or a year in the future.
I've tried using boost_where and most recently boost_by but neither work. boost_by generates a function_score function that errors out in ElasticSearch. Here's my most recent try.
Model.search('value', boost_by: { year: { scale: '5y' } })
ElasticSearch seems to dislike the calendar interval (5y) even though this should be valid. Here's the reason object from the error:
"caused_by": {
"type": "number_format_exception",
"reason": "For input string: \"5y\""
}
I've tried setting origin and decay along with scale but that doesn't seem to help anything.
Here is the query generated by Searchkick (model and field names changed due to a very specific domain model).
Model Search (163.5ms) model_development/_search {"query":{"function_score":{"functions":[{"weight":1,"gauss":{"year":{"scale":"5y"}}}],"query":{"bool":{"should":[{"dis_max":{"queries":[{"multi_match":{"query":"Abreu","boost":10,"operator":"and","analyzer":"searchkick_search","fields":["*.analyzed"],"type":"best_fields"}},{"multi_match":{"query":"Abreu","boost":10,"operator":"and","analyzer":"searchkick_search2","fields":["*.analyzed"],"type":"best_fields"}},{"multi_match":{"query":"Abreu","boost":1,"operator":"and","analyzer":"searchkick_search","fuzziness":1,"prefix_length":0,"max_expansions":3,"fuzzy_transpositions":true,"fields":["*.analyzed"],"type":"best_fields"}},{"multi_match":{"query":"Abreu","boost":1,"operator":"and","analyzer":"searchkick_search2","fuzziness":1,"prefix_length":0,"max_expansions":3,"fuzzy_transpositions":true,"fields":["*.analyzed"],"type":"best_fields"}}]}}]}},"score_mode":"sum"}},"timeout":"11s","_source":false,"size":10000}
Year is likely not a supported date format due to it not having an absolute representation. One day is always 24 hours, but one year is sometimes 364 days and usually 365 days. Rather than solve for this complexity, ES likely stops at days.
If you want, you can instead use days for your scale:
Model.search('value', boost_by: { year: { scale: '1825d' } })
Related
I have a simple timeseries table:
{
"n": "EXAMPLE", # Name, Hash Key
"t": 1640893628, # Unix Timestamp, Range Key
"v": 10 # Value being stored
}
Every 15 minutes I will poll data and insert into the table. If I want to query values between a 24-hour period, this works well - this would equate to a total of 96 records.
Now, say I want to query a larger timespan - 1 or 2 years. This is now tens of thousands of records, and (in my opinion) impractical to do regularly. This will require multiple queries to retrieve larger time ranges which would negatively impact response times as well as being much more costly.
I have thought of a couple of potential solutions to this problem:
1. Replicate data in another table, with larger increments. A table with a single record every 6 hours, for example.
2. Have another table to store common query results, such as records for "EXAMPLE" for the past week, month, and year (respectively). I would periodically update records in the new table to hold every N'th record in the main table (a total of 100). Something like:
{
"n": "EXAMPLE#WEEKLY",
"v": [
{
"t": 1640893628,
"v": 10
},
{
"t": 1640993628,
"v": 15
},
... 98 more.
]
}
I believe #2 is a solid approach. It seems to me like this would be a common enough problem, so I would love to hear about how other people have approached it.
More options present themselves if you can convert your unix timestamps into ISO 8601-type strings like 2021-12-31T09:27:58+00:00.
If so, DynamoDB's begins_with key condition expression lets us query for discrete calendar time buckets. December 2021, for example,
is queryable using n = id1 AND begins_with(t, "2021-12"). Same for days and hours. We can take this one step further by adding
other periods in indexes.
Some rolling windows are possible, too: n = id1 AND t > [24 hours ago] gives us last 24h.
n (PK) t (SK) hour_bucket (LSI1 SK) week (LSI2 SK)
id1 2021-12-31T10:45 2021-12-31T09-12 2021-52
id1 2021-12-31T13:00 2021-12-31T13-15 2021-52
id1 2022-06-01T22:00 2022-06-01T22-24 2022-01
If you are looking for arbitrary time-series queries, you might consider Athena, as the other answer suggested, or AWS's serverless
Timestream, which is a "purpose-built time series database that makes it easy to store and analyze trillions of time series data points per day. "
You could export the table to Amazon S3 and run Amazon Athena on the exported data. Here’s a blog post describing the process: https://aws.amazon.com/blogs/aws/new-export-amazon-dynamodb-table-data-to-data-lake-amazon-s3/
I have the contract start of a number of companies, and I want to report on each contract year by creating a column with the contract start updated to a select year. There are a number of solutions in SQL involving functions like DATE_ADD or DATEFROMPARTS, but I'm having trouble adapting it to rails (if those functions are available at all).
The closest I've gotten is: Company.select("contract_start + '1 YEAR'::INTERVAL as new_contract_start"). This adds 1 year to each contract start but doesn't take into account contracts older than a year (or started the same year). I've also tried the following but again run into syntax errors:
new_year = 2020
Company.select("contract_start + '#{new_year} - EXTRACT (YEAR from contract_start) YEAR'::INTERVAL")
I'm looking for a solution that can either:
Directly set the year to what I want
Add a variable amount of years based on its distance from the desired year
I'm on Ruby 2.3.3
I think the key here was finding functions compatible with the PostgreSQL that my database was built on. Once I started searching for the functions I thought would help and their PostgreSQL equivalents, I found more compatible solutions, such as: NUMTODSINTERVAL in PostgreSQL
I ended up with:
contract_start_year = 2020
Company.select("contract_start + make_interval(years => CAST (#{contract_start_year} - EXTRACT (YEAR from contract_start) as INT))
I've also made it a bit smarter by adding the number of years required to get the latest contract date without going over the report date. This would be problematic if the report start date was "2020-01-01" but the contract start was "2017-06-01". Setting the contract date to "2020-06-01" would overshoot the intentions of the report.
report_start = "`2020-07-01`"
Company.select("contract_start + make_interval(years => CAST (EXTRACT (YEAR FROM AGE(CAST (#{start_quotations} AS DATE), contract_start)) AS INT)) as new_contract_year")
Note the additional single quotes in report_start since the SQL code need to read a string to convert it to a date
There might be other methods that can "build" the date directly, but this methods works well enough for now.
I have nodes with person label where i am storing their date of births too. For e.g.:
Person
{
name: Tim
D.O.B: 01/23/1990
}
Now I need to calculate his age as of current date and time ( i.e. either 27 years or 27 years, 10 months, 18 days ). So, could anyone let me know how could I perform it?
P.S.: I tried the following but seems to be missing something here :
WITH apoc.date.parse('01/23/1990', 'y', 'MM/dd/yyyy') AS startDate,
apoc.date.format(timestamp(),'y','MM/dd/yyyy') as endDate,
apoc.date.parse(endDate,'y','MM/dd/yyyy') as ed
RETURN ed - 4
The units supported by the APOC date format/parse/add/convert functions are: ms,s,m,h,d and their long forms. To work with months, you need to be working with a specific calendar system, and there is no common month unit of time to do conversions or additions, as different months are comprised of different days (then there's the leap days in February).
For years, you're going to have to go with day units and use division by 365.
Here's a query that will get you age in years and days.
WITH apoc.date.parse('01/23/1990', 'd', 'MM/DD/yyyy') as birth, apoc.date.convert(timestamp(), 'ms', 'd') as now
WITH now - birth as daysAlive
RETURN daysAlive / 365 as yearsAlive, daysAlive % 365 as daysExtra
If you want to get into months, it may be better to work with the month/year fields from the MM/DD/yyyy representation and pull some mathematics on those. I'll see about what we can for supporting that in APOC.
Some like
WITH apoc.date.parse('01/23/1990', 'y', 'MM/dd/yyyy') AS startDate
RETURN apoc.date.convert(timestamp() - startDate,"ms","d");
perhaps ?
Hope this helps.
Regards,
Tom
I would like to write points into an influx 0.8 database with the time values given in seconds through HTTP. Here's a sample point in JSON format:
[
{
"points": [
[
1435692857.0,
897
]
],
"name": "some_series",
"columns": [
"time",
"value"
]
}
]
The documentation is unclear what the format of time values should be (nano or milli seconds?) and how to specify to influxdb what to expect. Currently I'm using a query parameter: precision=s
That seems to work fine, the server returns HTTP Status code 200 as expected. When querying against the database using influx' admin interface using select * from some_series the datapoints in the table are returned with the expected timestamp. On the graph however, the time axis is indexed with fractions of seconds and queries like select * from some_series where time > now() - 1h dont yield any results.
I assume that there is something wrong with the timestamps. I tried multiplying my value by 1000 but then nothing gets inserted into the database with no visible errors.
Whats the problem?
By default, supplied timestamps are assumed to be in milliseconds. I think your writes are defaulting to milliseconds because the query string parameter should be time_precision=s, not precision=s.
See the details under "Time Precision on Written Data" on https://influxdb.com/docs/v0.8/api/reading_and_writing_data.html.
I also think the time value should be an integer rather than a float. I'm not sure how to explain the other behaviors, where the timestamp seems to be the right date and multiplying by 1000 doesn't solve the issue, but I wonder if it's related to writing floats.
Please contact the InfluxDB support team at support#influxdb.com for further assistance.
I found the solution! The problem was only in part with the precision. Your answer was correct, the query parameter is called time_precision and I should post integers instead of floats. Which was probably the first thing I attempted with no results...
However, due to some time zone problems, my time values where in the future relative to server time and by default, any select statement includes a where time < now() statement. So, in fact values were written into the database, but not displayed because of that hidden where statement. The solution was to tell the database to return "future" values, too:
select value from some_series where time < now() + 1h
I need to find all records that were created on a specific day of week.
I only have available to me the standard model datetime timestamps.
How would I go about doing this in activerecord?
To follow up on Justin's answer
where("extract(dow from created_at) = ?", Date.today.wday)
This is what I'm using in my application for postgres. This will find all records that were created on the same day-of-week as today. For example, if today was tuesday it would find all records created on tuesdays.
You can use the DAYOFWEEK function in MySQL and pass it to the :conditions option. Supposing you have a model called Item, this would return all of the items created on Sunday:
Item.all(:conditions => ['dayofweek(created_at) = ?', 1])
Using Postgres you could do something similar with to_char.
Note that using a function like this will probably make the database do a full table scan, since at least MySQL doesn't support adding an index to a function. You may want to consider extracting the day of week out to another column if this is something that you anticipate doing frequently.
Obtain the seconds since Unix Epoch. Time.to_i does this in Ruby.
Use modulus of 7 to obtain the day of the week (0 to 6).
dayOfWeek = (epochseconds / 86400 ) % 7;
If you're not opposed to using ruby you could try this.
array.select { |arr| ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"].include?(arr.created_at.strftime('%A'))
I originally tried using dayofweek that was suggested in another answer.
The issue I ran in to was that it seems like my sql server was using UTC time and my rails server was using Eastern US. Records created after 8pm would be picked up while those that happened before would be considered the previous day.
Here is another related question:
How to filter by day of week in Rails 4.2 and sqlite?