Is there a way to generate an SRT file (or similar) using Google Cloud Speech? - google-cloud-speech

In order to generate subtitles for my videos, I converted them to audio files and used the Cloud Speech-to-Text. It works, but it only generates transcriptions, whereas what I need is a *.srt/*.vtt/similar file.
What I need is what YouTube does: to generate transcriptions and sync them with the video, like a subtitle format, ie.: transcriptions with the times when captions should appear.
Although I could upload them to YouTube and then download their auto-generated captions, it doesn't seem very correct.
Is there a way to generate an SRT file (or similar) using Google Cloud Speech?

There's no way really to do this directly from the Speech-to-Text API. What you could try to do is some post-processing on the speech recognition result.
For example, here's a request to the REST API using a model meant to transcribe video, with a public google-provided sample file:
curl -s -H "Content-Type: application/json" \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
https://speech.googleapis.com/v1p1beta1/speech:longrunningrecognize \
--data "{
'config': {
'encoding': 'LINEAR16',
'sampleRateHertz': 16000,
'languageCode': 'en-US',
'enableWordTimeOffsets': true,
'enableAutomaticPunctuation': true,
'model': 'video'
},
'audio': {
'uri':'gs://cloud-samples-tests/speech/Google_Gnome.wav'
}
}"
The above uses asynchronous recognition (speech:longrunningrecognize), which is more fitting for larger files. Enabling punctuation ('enableAutomaticPunctuation': true) in combination with the start and end times of words ('enableWordTimeOffsets': true) near the start and end of each sentence (which you'd also have to convert from nanos to timestamps) could allow you to provide a text file in the srt format. You would probably also have to include some rules about the maximum length of a sentence appearing on the screen at any given time.
The above should not be too difficult to implement, however, there's a strong possibility that you would still encounter timing/synchronization issues.

There is no way to do it using Google Cloud itself buy as suggested you may post-process the result.
In this file I have made a quick code that kind of does the job. You may want to adapt it to your needs:
function convertGSTTToSRT(string) {
var obj = JSON.parse(string);
var i = 1;
var result = ''
for (const line of obj.response.results) {
result += i++;
result += '\n'
var word = line.alternatives[0].words[0]
var time = convertSecondStringToRealtime(word.startTime);
result += formatTime(time) + ' --> '
var word = line.alternatives[0].words[line.alternatives[0].words.length - 1]
time = convertSecondStringToRealtime(word.endTime);
result += formatTime(time) + '\n'
result += line.alternatives[0].transcript + '\n\n'
}
return result;
}
function formatTime(time) {
return String(time.hours).padStart(2, '0')+ ':' + String(time.minutes).padStart(2, '0') + ':' +
String(time.seconds).padStart(2, '0') + ',000';
}
function convertSecondStringToRealtime(string) {
var seconds = string.substring(0, string.length - 1);
var hours = Math.floor(seconds / 3600);
var minutes = Math.floor(seconds % 3600 / 60);
seconds = Math.floor(seconds % 3600 % 60);
return {
hours, minutes, seconds
}
}

here is the code I used
import math
import json
import datetime
def to_hms(s):
m, s = divmod(s, 60)
h, m = divmod(m, 60)
return '{}:{:0>2}:{:0>2}'.format(h, m, s)
def srt_generation(filepath, filename):
filename = 'DL_BIRTHDAY'
with open('{}{}.json'.format(filepath, filename), 'r') as file:
data = file.read()
results = json.loads(data)['response']['annotationResults'][0]['speechTranscriptions']
processed_results = []
counter = 1
lines = []
wordlist = []
for transcription in results:
alternative = transcription['alternatives'][0]
if alternative.has_key('transcript'):
# print(counter)
# lines.append(counter)
tsc = alternative['transcript']
stime = alternative['words'][0]['startTime'].replace('s','').split('.')
etime = alternative['words'][-1]['endTime'].replace('s','').split('.')
if(len(stime) == 1):
stime.append('000')
if(len(etime) == 1):
etime.append('000')
lines.append('{}\n{},{} --> {},{}\n{}\n\n\n'.format(counter, to_hms(int(stime[0])), stime[1], to_hms(int(etime[0])), etime[1],tsc.encode('ascii', 'ignore')))
counter = counter+1
wordlist.extend(alternative['words'])
srtfile = open('{}{}.srt'.format(filepath, filename), 'wr')
srtfile.writelines(lines)
srtfile.close()
## Now generate 3 seconds duration chunks of those words.
lines = []
counter = 1
strtime =0
entime = 0
words = []
standardDuration = 3
srtcounter = 1
for word in wordlist:
stime = word['startTime'].replace('s','').split('.')
etime = word['endTime'].replace('s','').split('.')
if(len(stime) == 1):
stime.append('000 ')
if(len(etime) == 1):
etime.append('000')
if(counter == 1):
strtime = '{},{}'.format(stime[0], stime[1])
entime = '{},{}'.format(etime[0], etime[1])
words.append(word['word'])
else:
tempstmime = int(stime[0])
tempentime = int(etime[0])
stimearr = strtime.split(',')
etimearr = entime.split(',')
if(tempentime - int(strtime.split(',')[0]) > standardDuration ):
transcript = ' '.join(words)
lines.append('{}\n{},{} --> {},{}\n{}\n\n\n'.format(srtcounter, to_hms(int(stimearr[0])), stimearr[1], to_hms(int(etimearr[0])), etimearr[1],transcript.encode('ascii', 'ignore')))
srtcounter = srtcounter+1
words = []
strtime = '{},{}'.format(stime[0], stime[1])
entime = '{},{}'.format(etime[0], etime[1])
words.append(' ')
words.append(word['word'])
else:
words.append(' ')
words.append(word['word'])
entime = '{},{}'.format(etime[0], etime[1])
counter = counter +1
if(len(words) > 0):
tscp = ' '.join(words)
stimearr = strtime.split(',')
etimearr = entime.split(',')
lines.append('{}\n{},{} --> {},{}\n{}\n\n\n'.format(srtcounter, to_hms(int(stimearr[0])), stimearr[1], to_hms(int(etimearr[0])), etimearr[1],tscp.encode('ascii', 'ignore')))
srtfile = open('{}{}_3_Sec_Custom.srt'.format(filepath, filename), 'wr')
srtfile.writelines(lines)
srtfile.close()

Use this request parameter "enable_word_time_offsets: True" to get the time stamps for the word groups. Then create an srt programmatically.

If you require a *.vtt file, here is a snippet to convert the API response received from GCP speech-to-text client into a valid *.vtt. Some answers above are for *.srt so sharing this here.
const client = new speech.SpeechClient();
const [response] = await client.recognize(request);
createVTT(response);
function createVTT(response) {
const wordsArray = response.results[0].alternatives[0].words;
let VTT = '';
let buffer = [];
const phraseLength = 10;
let startPointer = '00:00:00';
let endPointer = '00:00:00';
VTT += 'WEBVTT\n\n';
wordsArray.forEach((wordItem) => {
const { startTime, endTime, word } = wordItem;
const start = startTime.seconds;
const end = endTime.seconds;
if (buffer.length === 0) {
// first word of the phrase
startPointer = secondsToFormat(start);
}
if (buffer.length < phraseLength) {
buffer.push(word);
}
if (buffer.length === phraseLength) {
endPointer = secondsToFormat(end);
const phrase = buffer.join(' ');
VTT += `${startPointer + ' --> ' + endPointer}\n`;
VTT += `${phrase}\n\n`;
buffer = [];
}
});
if (buffer.length) {
// handle the left over buffer items
const lastItem = wordsArray[wordsArray.length - 1];
const end = lastItem.endTime.seconds;
endPointer = secondsToFormat(end);
const phrase = buffer.join(' ');
VTT += `${startPointer + ' --> ' + endPointer}\n`;
VTT += `${phrase}\n\n`;
}
return VTT;
}
function secondsToFormat(seconds) {
const timeHours = Math.floor(seconds / 3600)
.toString()
.padStart(2, '0');
const timeMinutes = Math.floor(seconds / 60)
.toString()
.padStart(2, '0');
const timeSeconds = (seconds % 60).toString().padStart(2, '0');
const formattedTime = timeHours + ':' + timeMinutes + ':' + timeSeconds + '.000';
return formattedTime;
}
Note: enableWordTimeOffsets: true must be set but that's already answered above. This answer is for people who want .vtt copy.
Hope this was helpful to someone :)

Related

Extract URL from copied text in Google Sheets [duplicate]

I have a sheet where hyperlink is set in cell, but not through formula. When clicked on the cell, in "fx" bar it only shows the value.
I searched on web but everywhere, the info is to extract hyperlink by using getFormula().
But in my case there is no formula set at all.
I can see hyperlink as you can see in image, but it's not there in "formula/fx" bar.
How to get hyperlink of that cell using Apps Script or any formula?
When a cell has only one URL, you can retrieve the URL from the cell using the following simple script.
var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Sheet1");
var url = sheet.getRange("A2").getRichTextValue().getLinkUrl(); //removed empty parentheses after getRange in line 2
Source: https://gist.github.com/tanaikech/d39b4b5ccc5a1d50f5b8b75febd807a6
When Excel file including the cells with the hyperlinks is converted to Google Spreadsheet, such situation can be also seen. In my case, I retrieve the URLs using Sheets API. A sample script is as follows. I think that there might be several solutions. So please think of this as one of them.
When you use this script, please enable Sheets API at Advanced Google Services and API console. You can see about how to enable Sheets API at here.
Sample script:
var spreadsheetId = "### spreadsheetId ###";
var res = Sheets.Spreadsheets.get(spreadsheetId, {ranges: "Sheet1!A1:A10", fields: "sheets/data/rowData/values/hyperlink"});
var sheets = res.sheets;
for (var i = 0; i < sheets.length; i++) {
var data = sheets[i].data;
for (var j = 0; j < data.length; j++) {
var rowData = data[j].rowData;
for (var k = 0; k < rowData.length; k++) {
var values = rowData[k].values;
for (var l = 0; l < values.length; l++) {
Logger.log(values[l].hyperlink) // You can see the URL here.
}
}
}
}
Note:
Please set spreadsheetId.
Sheet1!A1:A10 is a sample. Please set the range for your situation.
In this case, each element of rowData is corresponding to the index of row. Each element of values is corresponding to the index of column.
References:
Method: spreadsheets.get
If this was not what you want, please tell me. I would like to modify it.
Hey all,
I hope this helps you save some dev time, as it was a rather slippery one to pin down...
This custom function will take all hyperlinks in a Google Sheets cell, and return them as text formatted based on the second parameter as either [JSON|HTML|NAMES_ONLY|URLS_ONLY].
Parameters:
cellRef : You must provide an A1 style cell reference to a cell.
Hint: To do this within a cell without hard-coding
a string reference, you can use the CELL function.
eg: "=linksToTEXT(CELL("address",C3))"
style : Defines the formatting of the output string.
Valid arguments are : [JSON|HTML|NAMES_ONLY|URLS_ONLY].
Sample Script
/**
* Custom Google Sheet Function to convert rich-text
* links into Readable links.
* Author: Isaac Dart ; 2022-01-25
*
* Params
* cellRef : You must provide an A1 style cell reference to a cell.
* Hint: To do this within a cell without hard-coding
* a string reference, you can use the CELL function.
* eg: "=linksToTEXT(CELL("address",C3))"
*
* style : Defines the formatting of the output string.
* Valid arguments are : [JSON|HTML|NAMES_ONLY|URLS_ONLY].
*
*/
function convertCellLinks(cellRef = "H2", style = "JSON") {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = SpreadsheetApp.getActiveSheet();
var cell = sheet.getRange(cellRef).getCell(1,1);
var runs = cell.getRichTextValue().getRuns();
var ret = "";
var lf = String.fromCharCode(10);
runs.map(r => {
var _url = r.getLinkUrl();
var _text = r.getText();
if (_url !== null && _text !== null) {
_url = _url.trim(); _text = _text.trim();
if (_url.length > 0 && _text.length > 0) {
switch(style.toUpperCase()) {
case "HTML": ret += '' + _text + '}' + lf; break;
case "TEXT": ret += _text + ' : "' + _url + '"' + lf; break;
case "NAMES_ONLY" : ret += _text + lf; break;
case "URLS_ONLY" : ret += _url + lf; break;
//JSON default : ...
default: ret += (ret.length>0?(','+ lf): '') +'{name : "' + _text + '", url : "' + _url + '"}' ; break;
}
ret += lf;
}
}
});
if (style.toUpperCase() == "JSON") ret = '[' + ret + ']';
//Logger.log(ret);
return ret;
}
Cheers,
Isaac
I tried solution 2:
var urls = sheet.getRange('A1:A10').getRichTextValues().map( r => r[0].getLinkUrl() ) ;
I got some links, but most of them yielded null.
I made a shorter version of solution 1, which yielded all the links.
const id = SpreadsheetApp.getActive().getId() ;
let res = Sheets.Spreadsheets.get(id,
{ranges: "Sheet1!A1:A10", fields: "sheets/data/rowData/values/hyperlink"});
var urls = res.sheets[0].data[0].rowData.map(r => r.values[0].hyperlink) ;

Google AdWords script issue

I am trying to set up an AdWords script for the first time but cannot get it to function properly. It is essentially supposed to crawl all of our clients' AdWords accounts, send all of the account information to a Google Sheet, and send me an email to let me know if there are any anomalies detected in any of the accounts. I have not had any luck with getting the info to send to the Google sheet, let a lone an email notification. This is the primary error I'm currently getting when I preview the script: TypeError: Cannot call method "getValue" of null. (line 510)
Here is the Google resource page (https://developers.google.com/adwords/scripts/docs/solutions/mccapp-account-anomaly-detector) for the script and the actual script itself that I'm using is below.
Any recommendations on how to get this to function properly would be greatly appreciated. Thank you!
// Copyright 2017, Google Inc. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
/**
* #name MCC Account Anomaly Detector
*
* #fileoverview The MCC Account Anomaly Detector alerts the advertiser whenever
* one or more accounts in a group of advertiser accounts under an MCC account
* is suddenly behaving too differently from what's historically observed. See
* https://developers.google.com/adwords/scripts/docs/solutions/mccapp-account-anomaly-detector
* for more details.
*
* #author AdWords Scripts Team [adwords-scripts#googlegroups.com]
*
* #version 1.4
*
* #changelog
* - version 1.4
* - Added conversions to tracked statistics.
* - version 1.3.2
* - Added validation for external spreadsheet setup.
* - version 1.3.1
* - Improvements to time zone handling.
* - version 1.3
* - Cleanup the script a bit for easier debugging and maintenance.
* - version 1.2
* - Added AdWords API report version.
* - version 1.1
* - Fix the script to work in accounts where there is no stats.
* - version 1.0
* - Released initial version.
*/
var SPREADSHEET_URL = 'https://docs.google.com/a/altitudemarketing.com/spreadsheets/d/1ELWZPcGLqf7n9GDnTx5o7xWOFZHVbgaLakeXAu5NY-E/edit?usp=sharing';
var CONFIG = {
// Uncomment below to include an account label filter
// ACCOUNT_LABEL: 'High Spend Accounts'
};
var CONST = {
FIRST_DATA_ROW: 12,
FIRST_DATA_COLUMN: 2,
MCC_CHILD_ACCOUNT_LIMIT: 50,
TOTAL_DATA_COLUMNS: 9
};
var STATS = {
'NumOfColumns': 4,
'Impressions':
{'Column': 3, 'Color': 'red', 'AlertRange': 'impressions_alert'},
'Clicks': {'Column': 4, 'Color': 'orange', 'AlertRange': 'clicks_alert'},
'Conversions':
{'Column': 5, 'Color': 'dark yellow 2', 'AlertRange': 'conversions_alert'},
'Cost': {'Column': 6, 'Color': 'yellow', 'AlertRange': 'cost_alert'}
};
var DAYS = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday',
'Saturday', 'Sunday'];
/**
* Configuration to be used for running reports.
*/
var REPORTING_OPTIONS = {
// Comment out the following line to default to the latest reporting version.
apiVersion: 'v201605'
};
function main() {
var account;
var alertText = [];
Logger.log('Using spreadsheet - %s.', SPREADSHEET_URL);
var spreadsheet = validateAndGetSpreadsheet(SPREADSHEET_URL);
spreadsheet.setSpreadsheetTimeZone(AdWordsApp.currentAccount().getTimeZone());
var dataRow = CONST.FIRST_DATA_ROW;
SheetUtil.setupData(spreadsheet);
Logger.log('MCC account: ' + mccManager.mccAccount().getCustomerId());
while (account = mccManager.next()) {
Logger.log('Processing account ' + account.getCustomerId());
alertText.push(processAccount(account, spreadsheet, dataRow));
dataRow++;
}
sendEmail(mccManager.mccAccount(), alertText, spreadsheet);
}
/**
* For each of Impressions, Clicks, Conversions, and Cost, check to see if the
* values are out of range. If they are, and no alert has been set in the
* spreadsheet, then 1) Add text to the email, and 2) Add coloring to the cells
* corresponding to the statistic.
*
* #return {string} the next piece of the alert text to include in the email.
*/
function processAccount(account, spreadsheet, startingRow) {
var sheet = spreadsheet.getSheets()[0];
var thresholds = SheetUtil.thresholds();
var today = AdWordsApp.report(SheetUtil.getTodayQuery(), REPORTING_OPTIONS);
var past = AdWordsApp.report(SheetUtil.getPastQuery(), REPORTING_OPTIONS);
var hours = SheetUtil.hourOfDay();
var todayStats = accumulateRows(today.rows(), hours, 1); // just one week
var pastStats = accumulateRows(past.rows(), hours, SheetUtil.weeksToAvg());
var alertText = ['Account ' + account.getCustomerId()];
var validWhite = ['', 'white', '#ffffff']; // these all count as white
// Colors cells that need alerting, and adds text to the alert email body.
function generateAlert(field, emailAlertText) {
// There are 2 cells to check, for Today's value and Past value
var bgRange = [
sheet.getRange(startingRow, STATS[field].Column, 1, 1),
sheet.getRange(startingRow, STATS[field].Column + STATS.NumOfColumns,
1, 1)
];
var bg = [bgRange[0].getBackground(), bgRange[1].getBackground()];
// If both backgrounds are white, change background Colors
// and update most recent alert time.
if ((-1 != validWhite.indexOf(bg[0])) &&
(-1 != validWhite.indexOf(bg[1]))) {
bgRange[0].setBackground([[STATS[field]['Color']]]);
bgRange[1].setBackground([[STATS[field]['Color']]]);
spreadsheet.getRangeByName(STATS[field]['AlertRange']).
setValue('Alert at ' + hours + ':00');
alertText.push(emailAlertText);
}
}
if (thresholds.Impressions &&
todayStats.Impressions < pastStats.Impressions * thresholds.Impressions) {
generateAlert('Impressions',
' Impressions are too low: ' + todayStats.Impressions +
' Impressions by ' + hours + ':00, expecting at least ' +
parseInt(pastStats.Impressions * thresholds.Impressions));
}
if (thresholds.Clicks &&
todayStats.Clicks < (pastStats.Clicks * thresholds.Clicks).toFixed(1)) {
generateAlert('Clicks',
' Clicks are too low: ' + todayStats.Clicks +
' Clicks by ' + hours + ':00, expecting at least ' +
(pastStats.Clicks * thresholds.Clicks).toFixed(1));
}
if (thresholds.Conversions &&
todayStats.Conversions <
(pastStats.Conversions * thresholds.Conversions).toFixed(1)) {
generateAlert(
'Conversions',
' Conversions are too low: ' + todayStats.Conversions +
' Conversions by ' + hours + ':00, expecting at least ' +
(pastStats.Conversions * thresholds.Conversions).toFixed(1));
}
if (thresholds.Cost &&
todayStats.Cost > (pastStats.Cost * thresholds.Cost).toFixed(2)) {
generateAlert(
'Cost',
' Cost is too high: ' + todayStats.Cost + ' ' +
account.getCurrencyCode() + ' by ' + hours +
':00, expecting at most ' +
(pastStats.Cost * thresholds.Cost).toFixed(2));
}
// If no alerts were triggered, we will have only the heading text. Remove it.
if (alertText.length == 1) {
alertText = [];
}
var dataRows = [[
account.getCustomerId(), todayStats.Impressions, todayStats.Clicks,
todayStats.Conversions, todayStats.Cost, pastStats.Impressions.toFixed(0),
pastStats.Clicks.toFixed(1), pastStats.Conversions.toFixed(1),
pastStats.Cost.toFixed(2)
]];
sheet.getRange(startingRow, CONST.FIRST_DATA_COLUMN,
1, CONST.TOTAL_DATA_COLUMNS).setValues(dataRows);
return alertText;
}
var SheetUtil = (function() {
var thresholds = {};
var upToHour = 1; // default
var weeks = 26; // default
var todayQuery = '';
var pastQuery = '';
var setupData = function(spreadsheet) {
Logger.log('Running setupData');
spreadsheet.getRangeByName('date').setValue(new Date());
spreadsheet.getRangeByName('account_id').setValue(
mccManager.mccAccount().getCustomerId());
var getThresholdFor = function(field) {
thresholds[field] = parseField(spreadsheet.
getRangeByName(field).getValue());
};
getThresholdFor('Impressions');
getThresholdFor('Clicks');
getThresholdFor('Conversions');
getThresholdFor('Cost');
var now = new Date();
// Basic reporting statistics are usually available with no more than a 3-hour
// delay.
var upTo = new Date(now.getTime() - 3 * 3600 * 1000);
upToHour = parseInt(getDateStringInTimeZone('h', upTo));
spreadsheet.getRangeByName('timestamp').setValue(
DAYS[getDateStringInTimeZone('u', now)] + ', ' + upToHour + ':00');
if (upToHour == 1) {
// First run of the day, clear existing alerts.
spreadsheet.getRangeByName(STATS['Clicks']['AlertRange']).clearContent();
spreadsheet.getRangeByName(STATS['Impressions']['AlertRange']).
clearContent();
spreadsheet.getRangeByName(STATS['Conversions']['AlertRange'])
.clearContent();
spreadsheet.getRangeByName(STATS['Cost']['AlertRange']).clearContent();
// Reset background and font Colors for all data rows.
var bg = [];
var ft = [];
var bg_single = [
'white', 'white', 'white', 'white', 'white', 'white', 'white', 'white',
'white'
];
var ft_single = [
'black', 'black', 'black', 'black', 'black', 'black', 'black', 'black',
'black'
];
// Construct a 50-row array of colors to set.
for (var a = 0; a < CONST.MCC_CHILD_ACCOUNT_LIMIT; ++a) {
bg.push(bg_single);
ft.push(ft_single);
}
var dataRegion = spreadsheet.getSheets()[0].getRange(
CONST.FIRST_DATA_ROW, CONST.FIRST_DATA_COLUMN,
CONST.MCC_CHILD_ACCOUNT_LIMIT, CONST.TOTAL_DATA_COLUMNS);
dataRegion.setBackgrounds(bg);
dataRegion.setFontColors(ft);
}
var weeksStr = spreadsheet.getRangeByName('weeks').getValue();
weeks = parseInt(weeksStr.substring(0, weeksStr.indexOf(' ')));
var dateRangeToCheck = getDateStringInPast(0, upTo);
var dateRangeToEnd = getDateStringInPast(1, upTo);
var dateRangeToStart = getDateStringInPast(1 + weeks * 7, upTo);
var fields = 'HourOfDay, DayOfWeek, Clicks, Impressions, Conversions, Cost';
todayQuery = 'SELECT ' + fields +
' FROM ACCOUNT_PERFORMANCE_REPORT DURING ' + dateRangeToCheck + ',' +
dateRangeToCheck;
pastQuery = 'SELECT ' + fields +
' FROM ACCOUNT_PERFORMANCE_REPORT WHERE DayOfWeek=' +
DAYS[getDateStringInTimeZone('u', now)].toUpperCase() +
' DURING ' + dateRangeToStart + ',' + dateRangeToEnd;
};
var getThresholds = function() { return thresholds; };
var getHourOfDay = function() { return upToHour; };
var getWeeksToAvg = function() { return weeks; };
var getPastQuery = function() { return pastQuery; };
var getTodayQuery = function() { return todayQuery; };
// The SheetUtil public interface.
return {
setupData: setupData,
thresholds: getThresholds,
hourOfDay: getHourOfDay,
weeksToAvg: getWeeksToAvg,
getPastQuery: getPastQuery,
getTodayQuery: getTodayQuery
};
})();
function sendEmail(account, alertTextArray, spreadsheet) {
var bodyText = '';
alertTextArray.forEach(function(alertText) {
// When zero alerts, this is an empty array, which we don't want to add.
if (alertText.length == 0) { return }
bodyText += alertText.join('\n') + '\n\n';
});
bodyText = bodyText.trim();
var email = spreadsheet.getRangeByName('email').getValue();
if (bodyText.length > 0 && email && email.length > 0 &&
email != 'foo#example.com') {
Logger.log('Sending Email');
MailApp.sendEmail(email,
'AdWords Account ' + account.getCustomerId() + ' misbehaved.',
'Your account ' + account.getCustomerId() +
' is not performing as expected today: \n\n' +
bodyText + '\n\n' +
'Log into AdWords and take a look: ' +
'adwords.google.com\n\nAlerts dashboard: ' +
SPREADSHEET_URL);
}
else if (bodyText.length == 0) {
Logger.log('No alerts triggered. No email being sent.');
}
}
function toFloat(value) {
value = value.toString().replace(/,/g, '');
return parseFloat(value);
}
function parseField(value) {
if (value == 'No alert') {
return null;
} else {
return toFloat(value);
}
}
function accumulateRows(rows, hours, weeks) {
var result = {Clicks: 0, Impressions: 0, Conversions: 0, Cost: 0};
while (rows.hasNext()) {
var row = rows.next();
var hour = row['HourOfDay'];
if (hour < hours) {
result = addRow(row, result, 1 / weeks);
}
}
return result;
}
function addRow(row, previous, coefficient) {
if (!coefficient) {
coefficient = 1;
}
if (!row) {
row = {Clicks: 0, Impressions: 0, Conversions: 0, Cost: 0};
}
if (!previous) {
previous = {Clicks: 0, Impressions: 0, Conversions: 0, Cost: 0};
}
return {
Clicks: parseInt(row['Clicks']) * coefficient + previous.Clicks,
Impressions:
parseInt(row['Impressions']) * coefficient + previous.Impressions,
Conversions:
parseInt(row['Conversions']) * coefficient + previous.Conversions,
Cost: toFloat(row['Cost']) * coefficient + previous.Cost
};
}
function checkInRange(today, yesterday, coefficient, field) {
var yesterdayValue = yesterday[field] * coefficient;
if (today[field] > yesterdayValue * 2) {
Logger.log('' + field + ' too much');
} else if (today[field] < yesterdayValue / 2) {
Logger.log('' + field + ' too little');
}
}
/**
* Produces a formatted string representing a date in the past of a given date.
*
* #param {number} numDays The number of days in the past.
* #param {date} date A date object. Defaults to the current date.
* #return {string} A formatted string in the past of the given date.
*/
function getDateStringInPast(numDays, date) {
date = date || new Date();
var MILLIS_PER_DAY = 1000 * 60 * 60 * 24;
var past = new Date(date.getTime() - numDays * MILLIS_PER_DAY);
return getDateStringInTimeZone('yyyyMMdd', past);
}
/**
* Produces a formatted string representing a given date in a given time zone.
*
* #param {string} format A format specifier for the string to be produced.
* #param {date} date A date object. Defaults to the current date.
* #param {string} timeZone A time zone. Defaults to the account's time zone.
* #return {string} A formatted string of the given date in the given time zone.
*/
function getDateStringInTimeZone(format, date, timeZone) {
date = date || new Date();
timeZone = timeZone || AdWordsApp.currentAccount().getTimeZone();
return Utilities.formatDate(date, timeZone, format);
}
/**
* Module that deals with fetching and iterating through multiple accounts.
*
* #return {object} callable functions corresponding to the available
* actions. Specifically, it currently supports next, current, mccAccount.
*/
var mccManager = (function() {
var accountIterator;
var mccAccount;
var currentAccount;
// Private one-time init function.
var init = function() {
var accountSelector = MccApp.accounts();
// Use this to limit the accounts that are being selected in the report.
if (CONFIG.ACCOUNT_LABEL) {
accountSelector.withCondition("LabelNames CONTAINS '" +
CONFIG.ACCOUNT_LABEL + "'");
}
accountSelector.withLimit(CONST.MCC_CHILD_ACCOUNT_LIMIT);
accountIterator = accountSelector.get();
mccAccount = AdWordsApp.currentAccount(); // save the mccAccount
currentAccount = AdWordsApp.currentAccount();
};
/**
* After calling this, AdWordsApp will have the next account selected.
* If there are no more accounts to process, re-selects the original
* MCC account.
*
* #return {AdWordsApp.Account} The account that has been selected.
*/
var getNextAccount = function() {
if (accountIterator.hasNext()) {
currentAccount = accountIterator.next();
MccApp.select(currentAccount);
return currentAccount;
}
else {
MccApp.select(mccAccount);
return null;
}
};
/**
* Returns the currently selected account. This is cached for performance.
*
* #return {AdWords.Account} The currently selected account.
*/
var getCurrentAccount = function() {
return currentAccount;
};
/**
* Returns the original MCC account.
*
* #return {AdWords.Account} The original account that was selected.
*/
var getMccAccount = function() {
return mccAccount;
};
// Set up internal variables; called only once, here.
init();
// Expose the external interface.
return {
next: getNextAccount,
current: getCurrentAccount,
mccAccount: getMccAccount
};
})();
/**
* Validates the provided spreadsheet URL and email address
* to make sure that they're set up properly. Throws a descriptive error message
* if validation fails.
*
* #param {string} spreadsheeturl The URL of the spreadsheet to open.
* #return {Spreadsheet} The spreadsheet object itself, fetched from the URL.
* #throws {Error} If the spreadsheet URL or email hasn't been set
*/
function validateAndGetSpreadsheet(spreadsheeturl) {
if (spreadsheeturl == 'YOUR_SPREADSHEET_URL') {
throw new Error('Please specify a valid Spreadsheet URL. You can find' +
' a link to a template in the associated guide for this script.');
}
var spreadsheet = SpreadsheetApp.openByUrl(spreadsheeturl);
var email = spreadsheet.getRangeByName('email').getValue();
if ('foo#example.com' == email) {
throw new Error('Please either set a custom email address in the' +
' spreadsheet, or set the email field in the spreadsheet to blank' +
' to send no email.');
}
return spreadsheet;
}

AdWords Script With Divergent Data from Webview

I have the following code on an AdWords script:
var campanha = 'xyz';
function main() {
campanhas = AdWordsApp.campaigns().withCondition("CampaignName = '" + campanha + "'").withCondition('Status = ENABLED').get();
while(campanhas.hasNext()) {
campanha = campanhas.next();
if(campanha.getBiddingStrategyType() != 'ENHANCED_CPC') {
Logger.log('Ajuste de lance da campanha inválido: ' +campanha.getBiddingStrategyType());
continue;
}
palavras = campanha.keywords().withCondition('Status = ENABLED').forDateRange("YESTERDAY").get();
while(palavras.hasNext()) {
palavra = palavras.next();
estatisticas = palavra.getStatsFor('YESTERDAY');
lances = palavra.bidding();
estimativa_primeira = Math.max(palavra.getTopOfPageCpc(), palavra.getFirstPageCpc());
posicao = estatisticas.getAveragePosition();
if (posicao > 1) {
Logger.log(palavra.getText() +" = " +" "+palavra.getTopOfPageCpc()+" "+palavra.getFirstPageCpc());
}
//Logger.log(palavra.getText() + ' = '+lances.getCpc()+" = "+estatisticas.getAveragePosition());
}
}
I expected to retrieve the estimated first page CPC and estimated first position CPC. But the values that I received are diferente from the AdWords webinterface. For exemple, for a given keyword the script returned +xxx +yyyy = 0.12 0.05, when I look those keywords on webinterface I have the following values for 0.12 for estimated first page CPC and 0.41 for estimated first position CPC.

Is there a way in selenium through which we can verify that the image displayed is correct and is not chaged with same file name

I am automating an application where I need to verify the cover Image of a book.
I encoutered a situation where the cover image changed and my script was not able to report this, since the the image source remained same.
You could check that the hash of the targeted image doesn't change. Here is an example to compute the hash of an image with Selenium / Python:
from selenium import webdriver
JS_GET_IMAGE_HASH = """
var hash = 0, ele = arguments[0], xhr = new XMLHttpRequest();
var src = ele.src || window.getComputedStyle(ele).backgroundImage;
xhr.open('GET', src.match(/https?:[^\"')]+/)[0], false);
xhr.send();
for (var i = 0, buffer = xhr.response; i < buffer.length; i++)
hash = (((hash << 5) - hash) + buffer.charCodeAt(i)) | 0;
return hash.toString(16).toUpperCase();
"""
driver = webdriver.Firefox()
driver.get("https://www.google.co.uk/")
# get the logo
ele_image = driver.find_element_by_id("hplogo")
# compute the hash of the logo
image_hash = driver.execute_script(JS_GET_IMAGE_HASH, ele_image)
# print the hash code
print image_hash
Or with Selenium / Java:
final String JS_GET_IMAGE_HASH =
"var hash = 0, ele = arguments[0], xhr = new XMLHttpRequest(); " +
"var src = ele.src || window.getComputedStyle(ele).backgroundImage; " +
"xhr.open('GET', src.match(/https?:[^\"')]+/)[0], false); " +
"xhr.send(); " +
"for (var i = 0, buffer = xhr.response; i < buffer.length; i++) " +
" hash = (((hash << 5) - hash) + buffer.charCodeAt(i)) | 0; " +
"return hash.toString(16).toUpperCase(); ";
WebDriver driver = new FirefoxDriver();
JavascriptExecutor js = (JavascriptExecutor)driver;
driver.get("https://www.google.co.uk/");
// get the logo
WebElement ele_image = driver.findElement(By.id("hplogo"));
// compute the hash of the logo
String image_hash = (String)js.executeScript(JS_GET_IMAGE_HASH, ele_image);
// print the hash code
System.out.println(image_hash);
You could solve this without any image processing by calculating file checksums.

How can I properly parse an email address with name?

I'm reading email headers (in Node.js, for those keeping score) and they are VARY varied. E-mail addresses in the to field look like:
"Jake Smart" <jake#smart.com>, jack#smart.com, "Development, Business" <bizdev#smart.com>
and a variety of other formats. Is there any way to parse all of this out?
Here's my first stab:
Run a split() on - to break up the different people into an array
For each item, see if there's a < or ".
If there's a <, then parse out the email
If there's a ", then parse out the name
For the name, if there's a ,, then split to get Last, First names.
If I first do a split on the ,, then the Development, Business will cause a split error. Spaces are also inconsistent. Plus, there may be more e-mail address formats that come through in headers that I haven't seen before. Is there any way (or maybe an awesome Node.js library) that will do all of this for me?
There's a npm module for this - mimelib (or mimelib-noiconv if you are on windows or don't want to compile node-iconv)
npm install mimelib-noiconv
And the usage would be:
var mimelib = require("mimelib-noiconv");
var addressStr = 'jack#smart.com, "Development, Business" <bizdev#smart.com>';
var addresses = mimelib.parseAddresses(addressStr);
console.log(addresses);
// [{ address: 'jack#smart.com', name: '' },
// { address: 'bizdev#smart.com', name: 'Development, Business' }]
The actual formatting for that is pretty complicated, but here is a regex that works. I can't promise it always will work though. https://www.rfc-editor.org/rfc/rfc2822#page-15
const str = "...";
const pat = /(?:"([^"]+)")? ?<?(.*?#[^>,]+)>?,? ?/g;
let m;
while (m = pat.exec(str)) {
const name = m[1];
const mail = m[2];
// Do whatever you need.
}
I'd try and do it all in one iteration (performance). Just threw it together (limited testing):
var header = "\"Jake Smart\" <jake#smart.com>, jack#smart.com, \"Development, Business\" <bizdev#smart.com>";
alert (header);
var info = [];
var current = [];
var state = -1;
var temp = "";
for (var i = 0; i < header.length + 1; i++) {
var c = header[i];
if (state == 0) {
if (c == "\"") {
current.push(temp);
temp = "";
state = -1;
} else {
temp += c;
}
} else if (state == 1) {
if (c == ">") {
current.push(temp);
info.push (current);
current = [];
temp = "";
state = -1;
} else {
temp += c;
}
} else {
if (c == "<"){
state = 1;
} else if (c == "\"") {
state = 0;
}
}
}
alert ("INFO: \n" + info);
For something complete, you should port this to JS: http://cpansearch.perl.org/src/RJBS/Email-Address-1.895/lib/Email/Address.pm
It gives you all the parts you need. The tricky bit is just the set of regexps at the start.

Resources