I am new to bokeh/pandas and trying to plot a trend line by using month-year for x-axis and integer values for y-axis.
My data looks like below:
year_month emp_count
0 2015-09 1450425
1 2015-10 3093811
2 2015-11 3316241
3 2015-12 3308658
4 2016-01 3402191
To plot using bokeh I am converting both columns to ndarray. When i convert year-month column to ndarray, it shows each value as a Period. I have used to_period('M') method to get year_month out of a date column.
temp_df.year_month.values
>>output
array([Period('2015-09', 'M'), Period('2015-10', 'M'),
Period('2015-11', 'M'), Period('2015-12', 'M'),
Period('2016-01', 'M'), Period('2016-02', 'M'),
So when i plot using this data, i get following error:
TypeError: Object of type 'Period' is not JSON serializable
To avoid this error i converted year_month column type to string but i still get the same error. My complete code looks like below:
temp_df.year_month = temp_df.year_month.astype(str)
output_file('trend1.html')
p = figure(title='Employee trend',
plot_width=800,
plot_height=350,
x_axis_label='Month-Year', y_axis_label='No of Employees',
x_axis_type='datetime')
p.line(x= temp_df.year_month,
y = temp_df.emp_count)
show(p)
Does anyone know how to plot year-month on x-axis using bokeh?
I guess I found the problem. You should convert the column to date time.
df['year_month']=pd.to_datetime(df['year_month'])
This should change your column values to as below (day is defaulted to 01):
year_month emp_count
0 2015-09-01 1450425
1 2015-10-01 3093811
2 2015-11-01 3316241
3 2015-12-01 3308658
4 2016-01-01 3402191
Then the plot would work. I tested it on a dummy value as output is below.
Value month_year
2 2018-11-01
3 2018-01-01
4 2018-02-01
5 2018-05-01
sample=pd.DataFrame(pd.read_csv('sample.csv'))
sample['month_year']=pd.to_datetime(sample['month_year'])
p = figure(title='Employee trend',
plot_width=800,
plot_height=350,
x_axis_label='Month-Year', y_axis_label='No of Employees',
x_axis_type='datetime')
p.scatter(x= sample.month_year,
y = sample.Value)
show(p)
Let me know if this works.
Thanks
I have solved the issue by an alternate approach. Thanks to #Samira for inspiration.
I extracted the year-month from date object and defaulted day to '1'.
df = df.join(df.as_of_date.apply(lambda x : pd.Series({
'day': x.day,
'year':x.year,
'month': x.month,
'year_month': x.to_period('M'),
'year_month_01': pd.datetime(x.year,x.month,1)
})))
After that used 'year_month_01' on axis and bokeh graph looks as expected.
bokeh graph
Related
I've got a spreadsheet that's automatically being populated by a form. But when I try to make graphs out of the data I always get some kind of error...
The timeline diagram should support date/time format, but I've even tried it with just a date format, converting it to decimals using =DateValue(), other graph types, ...
This is a screenshot of the data and error
The data in the timestamp column is date/time and the data in the time column is a number.
Yet the chart isn't rendering...
Timestamp time
23-12-2020 9:31:44 0.16
23-12-2020 11:06:08 0.75
23-12-2020 11:55:27 0.24
23-12-2020 12:14:30 0.12
23-12-2020 15:18:25 0.73
23-12-2020 17:17:46 0.6
24-12-2020 13:33:49 0.16
24-12-2020 13:51:57 0.01
24-12-2020 15:28:08 1.21
24-12-2020 17:38:36 0.11
24-12-2020 23:40:46 0.15
25-12-2020 11:34:45 0.13
25-12-2020 15:51:53 0.16
25-12-2020 16:08:12 0.06
26-12-2020 11:01:35 0.75
26-12-2020 11:52:03 0.18
26-12-2020 12:24:22 0.15
Can anyone help me out here?
Copy of the sheet:
https://docs.google.com/spreadsheets/d/1qJrC55_EPcTZ7nMPsscU69MLPVclImjBq3Ij-nG_P7I/edit?usp=sharing
the issue is with your B column. you are using Netherlands locale settings where:
0.16 > not number
0,16 > is valid number
now you got two options. you either:
change locale to United Kingdom or USA
delete the chart
select column B and format it as Number
select column A and format it as Date time
create a timeline chart with range A:B
or you can:
change dot . for comma , in B column
delete the chart
format B column as Numbers
format A column as Date time
create a timeline chart with range A:B
This is happening because the locale is not properly defined. Go to File > Settings and change Locale to your language. This will ensure that Sheets understands your datetime format. You may need to add the data again to be detected properly. If you have the data without formatting, you’ll know that it was read properly because the datetime will be aligned to the right.
References
Set a spreadsheet’s location & calculation settings
Row 1: cell A is a concat of the date in B and the time in C. I generate these with CTRL+: and CTRL+SHIFT+: respectively. Google sheets does not treat this like a timestamp on the x axis of charts
Row 2: I discovered CTRL+ALT+SHIFT+: to do a full timestamp, now it has a real timestamp
The issue is, I have many rows of recorded data of the type in Row 1 -- is there any way to convert this into a 'time' format that Google Sheets will respect on the x-axis of charts? Using VALUE() just gives the date portion of the timestamp.
Kind of crazy how much trouble this is causing me, is there really no date_parse(string_format) type function I can call?
EDIT:
this is ridiculous, just going to export and use python
instead VALUE use TIMEVALUE and then format it internally to time
or:
=TEXT(TIMEVALUE(A1); "hh:mm:ss")
for arrayformula:
=ARRAYFORMULA(IF(A1:A="",,TEXT(TIMEVALUE(A1:A); "hh:mm:ss")))
for timestamp > date use DATEVALUE
I need to be able to find the points in series with a X value (time in milliseconds) falling in a given Year and Month, say Jan 1984, what is the most efficient way?
You can find a point by x value in this way:
var series = chart.series[0],
index = series.xData.indexOf(1553779800000),
point = series.points[index];
Live demo: https://jsfiddle.net/BlackLabel/h9qye01r/
I'm using Tableau Desktop, my data are like this:
KPI,date,monthValue
coffee break,01/06/2015,10.50
coffee break,01/07/2015,8.30
and I want to build a table like this
KPI, year(date), last value
coffee time, 2015, 8.30
How can I set a calculated field in order to show me the last value available in that year? I tried to do:
LOOKUP([MonthValue], LAST())
But it didn't work and tells me 'cannot mix aggregate and non-aggregate', so I did:
LOOKUP(sum([MonthValue]), LAST())
But it didn't work too. How should I proceed?
If you are using Tableau 9 then you can do this with an LOD calc that looks for the max value in your date field and then checks if the current date value is the same as the max date value.
[Date] == {fixed: max([Date])}
As you can see in the example below when you use the calc as a filter you will only get the last row from your example above.
UPDATE: to get the values per year you can do something like:
Here I am using a table calculation to find the max date per year and then ranking those dates and filtering down to the latest date in each year (which will be the one that has a rank equal to 1).
!max date is WINDOW_MAX(ATTR(Date))
!rank is RANK(Date)
You need to make sure that the table calculations are computer in the correct way (in this case across the values of each year).
I'm trying use epoch time dates in my series data. The array looks like this:
data:[ [1324857600,205.4],[1324771200,208.7],[1324684800,205.4]. . .]
The points display fine, but the date labels on the x-axis and tooltip are all set to 16 Jan 1970 (the beginning of epoch time!).
If I do a bunch of string-fu I can produce an array that looks like this:
data:[ [Date.UTC(2011, 11, 26),247.7],[Date.UTC(2011, 11, 25),245.5] . . .]
When I do it this way the date labels on the x-axis are correct.
I've tried using the dateTimeLabelFormat option and it formats the date correctly - it's just that when I try to use millisecond values all I get is 16 Jan 70.
Any ideas? I'd rather work with milliseconds than jump through all the hoops to produce "Date.UTC(2011, 11, 26)."
Thanks!
Found the answer on the Highsoft forum.
I need to multiply the epoch time values by 1000 to get the proper millisecond values for Highcharts.
Works great!