As my work involves viewing many items from a website, I need to know which items have been visited and which not, so as to avoid repeated viewing.
The problem is that the URL of these items include some garbage parameters that are dynamically changing. This means the browser's history record is almost useless in identifying which items have already been viewed.
This is an example of the URL:
https://example.com/showitemdetail/?item_id=e6de72e&hitkey=true&index=234&cur_page=1&pageSize=30
Only the "item_id=e6de72e" part is useful in identifying each item. The other parameters are dynamic garbage.
My question is: how to let Chrome mark only the "example.com/showitemdetail/?item_id=e6de72e" part as visited, and ignore the rest parameters?
Please note that I do NOT want to modify the URLs, because that might alarm the website server to suspect that I am abusing their database. I want the garbage parameters to be still there, but the browser history mechanism to ignore them.
I know this is not easy. I am proposing a possible solution, but do not know whether it can be implemented. It's like this:
Step: 1) An extension background script to extract the item_id from each page I open, and then store it in a collection of strings. This collection of strings should be saved in a file somewhere.
Step: 2) Each time I open a webpage with a list of various items, the background script verifies whether each URL contains a string which matches any one in the above collection. If so, that URL would be automatically added to history. Then that item will naturally be shown as visited.
Does the logic sound OK? And if so how to implementable it by making a simple extension?
Of course, if you have other more neat solutions, I'd be very interested to learn.
Assuming that the link to the items always have the item_id, that would work, yes.
You would need the following steps:
Recording an element
content_script that adds a code to the product pages and tracks it.
On accessing the product page:
i. You can extract the current product id by checking the URL parameters (see one of these codes).
ii. You use storage api to retrieve a certain stored variable, say: visited_products. This variable you need to implement it as a Set since it's the best data type to handle unique elements.
iii. You check whether the current element is on the list with .has(). If yes, then you skip it. If all is good, it should always be new, but no harm in checking. If not, then you use add() to add the new product id (although Set will not allow you to add a repeated items, so you can skip the check and just save add it directly). Make sure you store it to Chrome.
Now you have registered a visit to a product.
Checking visited elements
You use a content_script again to be inserted on product pages or all pages if desired.
You get all the links of the page with document.querySelectorAll(). You could apply a CSS selector like: a[href*="example.com/showitemdetail/?item_id="] which would select all the links whose href contains that URL portion.
Then, you iterate the links with a for loop. On each iteration, you extract the item_id. Probably, the easiest way is: /(?:item_id=)(.*?)(?:&|$)/. This matches all characters preceded by item_id= (not captured) until it finds an & or end of the string (whichever happens first, and not captured).
With the id captured, you can check the Set of the first part with .has() to see whether it's on the list.
Now, about how to handle whether it's on the list, depends on you. You could hide visited elements. Or apply different CSS classes or style to them so you differentiate them easily.
I hope this gives you a head start. Maybe you can give it a try and, if you cannot make it work, you can open a new question with where you got stuck.
Thanks a lot, fvbuendia. After some trial and error elbow grease, I made it.
I will not post all the codes here, but will give several tips for other users' reference:
1) To get the URL of newly opened webpage and extract the IDs, use chrome.tabs.onUpdated.addListener and extractedItemId = tab.url.replace(/..../, ....);
2) Then save the IDs to storage.local, using chrome.storage.local.set and chrome.storage.local.get. The IDs should be saved to an object array.
1) and 2) should be written in the background script.
3) Each time the item list page is opened, the background calls a function in the content script, asking for all the URLs in the page. Like this:
chrome.tabs.onUpdated.addListener(function(tabId, changeInfo, tab) {
if(changeInfo.status == "complete") {
if(tab.url.indexOf("some string typical of the item list page URL") > -1) {
chrome.tabs.executeScript(null, { code: 'getalltheurls();' });
} }
});
4) The function to be executed in content script:
function getalltheurls() {
var urls = [];
var links = document.links;
for (var i = 0; i < links.length; i++) {
if(links[i].href.indexOf("some string typical of the item list URLs") > -1) { urls.push(links[i].href);}
}
chrome.runtime.sendMessage({ urls: urls });
};
5) Background receives the URLs, then converts them to an array of IDs, using
idinlist = urls[i].replace(........)
6) Then background gets local storage, using chrome.storage.local.get, and checks if these IDs are in the stored array. If so, add the URL to history.
for (var i = 0; i < urls.length; i++) {
if (storedIDs.indexOf(idinlist) > -1 ) { chrome.history.addUrl({ url: urls[i] }); }
}
Related
I'd like to pass the Unix timestamp to a hit level eVar in DTM. I would assume I could pass some Javascript like this:
function() {
var now = new Date();
return now.getTime();
}
However, I am not sure where to pass it in DTM. Would this be passed in the "Customize Page Code" editor in the Tool Settings or somewhere else?
You can create a Data Element of type Custom Code. Name it something like current_timestamp or whatever. The code should not be wrapped in the function declaration syntax (DTM already wraps it in a function callback internally). So just put the following in the code box:
var now = new Date();
return now.getTime();
Then in your Adobe Analytics Tool Config (for global variables), or within a Page Load, Event Based, or Direct Call Rule, within the Adobe Analytics Config section. choose which eVar you want to set, and for the value, put %current_timestamp% (or whatever you named it, using % at start/end of it. You should see it show up in a dropdown as you start typing % in the value field).
Alternatively, if you want to assign the eVar in a custom code box in one of those locations, you can use the following javascript syntax e.g (assume eVar1 in example).
s.eVar1 = _satellite.getVar('current_timestamp');
Note that with this syntax, you do not wrap the data element name with %
One last note. This is client-side code, so the timestamp will be based on the user's browser's timezone settings. So for example, a visitor from the US and another visitor from China both visiting a page physically at the same time (server request at the same time), will show two different timestamps because they are in two different timezones.
This makes for some misleading data in reports, so make sure you break it down by other geo based dimensions, or do some extra math in your Data Element to convert the timestamp to a single timezone (e.g. convert it to EST). In practice, most people will pick whatever timezone their office is located in, or else what their server's timezone is set to.
I'm using Angular 5 and mat-accordion to show a list of authors. Each author has written multiple books and articles. The author's name appears in the panel-header and the content of the panel shows all of the books, articles, etc.
Because I want to display 100+ authors each with 50+ entries, I don't want to populate the entire accordion and content at once. What I'd like to have happen is that when the user clicks on an author, it kicks off a service that queries the database and then fills the panel content as appropriate. If the panel is closed, the content should remain so re-expanding the panel doesn't kick off another database query.
So when I visit the page, I see the authors Alice, Bob, and Eve. When click on Alice, the app queries the database, gets back Alice's entries, renders the content, then the accordion expands. When I click on Eve, the app should close Alice's panel, query the db, get Eve's entries, render the content, and finally expand the panel.
If I click on Alice again Eve's panel closes, but since the content is already there for Alice, there is no db query or rendering. It just expands. The docs say to use ng-template, but I'm not sure how to do that, and really not sure how to do it so the content remains after the panel is closed. I'm not worried about there being a change to the data that would require getting Alice's data again in case there was a change.
Any examples out there of the best way to handle this?
thanks!
G. Tranter's answer was correct, I was on the right path. If anyone else ends up on this page, here is what I ended up doing.
ngOnInit(){
this.authorsRetrieved.subscribe( authors => {
this.allAuthors = authors as Array;
this.authorsRetrieved = new Array(
Math.max.apply(Math, this.allTrainers.map(function(t){ return t.trainer_id; }))
);
// as authors are added and deleted, the author_id won't equal the number of
// authors, so get the highest id number, create an array that long
// then fill it with blanks so the keys have some value
this.authorsRetrieved.fill([{}]);
});
showAuthorsWorks(authorID: Number = -1){
if(authorID > this.authorsRetrieved.length){
const tempArray = new Array(authorID - this.authorsRetrieved.length + 1);
tempArray.fill([{}]);
this.authorsRetrieved = this.authorsRetrieved.concat(tempArray);
}
// only make the network call if we have to
// because we filled the id array, we can't just use length
if(typeof(this.authorsRetrieved[authorID][0]['manuscript_id']) === 'undefined'){
this.authorWorksService.getAuthorWorks(authorID).subscribe( works => {
this.worksRetrieved.splice(authorID, 0, works as Array<any>);
});
}
I added a check for the almost impossible situation where the array length is less than the max author_id. You have to create an empty array of N elements, then fill that array. If you don't, the length of the empty array is 0, and you can't push data to an array element that doesn't exist. Even though at the chrome console it says the length is N and the elements are there, just empty.
Thanks again!
If you are referring to the MatExpansionPanelContent directive used with ng-template, all that does is delay loading content until the panel is opened. It doesn't know whether or not it has already been loaded. So if you are using a bound expression for content such as {{lazyContent}} that will be evaluated every time the tab is opened. You need to manage content caching yourself. One easy way to do that is via a getter.
In your component:
_lazyContent: string;
get lazyContent() {
if (!this._lazyContent) {
this._lazyContent = fetchContent();
}
return this._lazyContent;
}
Plus in your HTML:
<mat-expansion-panel>
...
<ng-template matExpansionPanelContent>
{{lazyContent}}
</ng-template>
....
</mat-expansion-panel>
So the ng-template takes care of the lazy loading, and the getter takes care of caching the content.
I'm building a web crawler in F# and I'm running into a problem with how to store the pages I've already been to and the pages that I have yet to visit.
My current implementation involves tracking state with a list of records
type Page = {url:Uri; visited:bool; redirects:bool}
let createCrawlLink (url: Uri) = {url=url; visited=false; redirects=false}
let initialize url = [createCrawlLink(url)]
let uriInList(data:Page list)(uri:Uri) = List.exists (fun x -> x.url.AbsoluteUri = uri.AbsoluteUri) data
let add (data:Page list) (url) =
let uri = new Uri(url)
match uriInList data uri with
| true -> data
| false -> (createCrawlLink uri) :: data
Now when I pull the first item off that list and visit it I'd like to do a few things.
Mark the url I was directed to as visited.
If the url I ended up on wasn't the one I attempted to go to, mark that as visited and redirected.
Add all new links to the list.
Per page business logic
Recurse until everything in the list is visited.
I am getting hung up on what the functional way of altering the records visited/redirected properties are. So far it seems like I have to find the record, make a copy with the properties I want changed, then copy the entire list into a new list with the old record removed and the new record added.
This seems like a lot of work but google isn't finding me any good data structures for this (or I don't know the words to search for). Is there a better cleaner way?
You're using a list, but as ildjarn said in a comment, you should probably use a set instead. However, if you need to keep track of multiple flags per URI (has this one been visited? does this one redirect?), then you'd have to keep track of multiple sets (visitedURIs and redirectingURIs, for example).
Therefore, the data structure you probably want is the PersistentHashMap from FSharpx.Collections. It's a persistent data structure, so it's non-destructive every time you make an update in it, you get back a new hash map with the change, but the old hash map still exists unchanged so that any other functions that still have a reference to it will still see a consistent view of the data (this is a HUGE advantage when you start trying to parallelize your code!)
Also note that for lists, if you need to make frequent updates in the middle of an existing list, the PersistentVector type (also from FSharpx.Collections) is very well-suited to that.
I think storing pages to visit separately from pages visited makes this simpler and more efficient whether it's functional or not.
I would store visited pages in a Map<string, Page>, where string is the URL so that you have constant time access to visited pages.
Then take queued URLs to visit from the head of a list with pattern matching and build up the results in a map.
type Page = { url:Uri; redirects:bool }
type PagesVisited = Map<string, Page>
let rec crawl (urisToVisit:Uri list) (visited:PagesVisited) : PagesVisited =
match urisToVisit with
| uri :: remainingUris ->
if Map.containsKey (uri:Uri).AbsoluteUri visited then
crawl remainingUris visited
else
let (redirects, newUris) = visit uri
let visited' = Map.add uri.AbsoluteUri {url=uri; redirects = redirects} visited
crawl (newUris # urisToVisit) visited'
| [] ->
printfn "Finished the internet"
visited
// Kick it off
crawl [Uri("https://stackoverflow.com")] Map.empty
This shows you a possible functional way of doing this loop. I've left the implementation of visit for you.
Note that adding new items to the front of a list is efficient. It does not create a new copy of the list in memory. So I use the list concatenation operator # to put what is likely to be the shorter list in front of what is likely to be longer.
Similarly, the PagesVisited map is not being copied on each loop, even though each instance is immutable. Structural sharing is used so that items can be added and removed while still holding references to previous versions of the map. This is much faster than a full copy.
If you care about making this fast and efficient more than keeping it functional, you would probably use the mutable collections ResizeArray and Dictionary instead.
I have the below code in my web site.
I want to track each anchor tag using DTM. I know how to track single element. Since here we have a bunch of different elements, can anyone help how to track them using DTM? I don't want to create separate rule for each element. In a single rule how can we track these elements.
Here is an example of what you can do.
For Element Tag or Selector put "a.at-share-btn" (no quotes). This will target all the relevant links first. We can look for this too in the next step, but "pre-qualifying" it with this will improve performance so that the rule is not evaluated against every single a click.
Then, under Rule Conditions, add a Criteria of type Data > Custom.
In the Custom box, add the following:
var shareType = this.getAttribute('class').match(/\bat-svc-([a-z_-]+)/i);
if (shareType&&shareType[1]) {
_satellite.setVar('shareType',shareType[1]);
return true;
}
return false;
This code looks for the class (e.g. "at-svc-facebook") and puts the last part of it (e.g. "facebook") into a data element named shareType.
Then, you can reference it using %shareType% in any of the DTM fields. Note: because this data element is made on-the-fly, it will not show up in the auto-complete when you type it out in a field.
Alternatively, in custom code boxes (e.g. if you are needing to reference it in a javascript/3rd party tag box), you can use _satellite.getVar('shareType')
I am building an Internal social networking website on SharePoint. Since its a networking intranet, I want it to be Open and non moderated. However, I also dont want people to use abusive / Foul or bad language words in the portal.
I tried Googling and wasnt really sucessfull in finding a solution.
Microsoft Forefront will do that for me, but it only does for Documents. But I also want to do that on Lists since Discussion forum on the SharePoint is in a list format.
You may create site solution/list definition for your site using Visual studio Sharepoint Site Solution Genarator. Create a custom list and name it as you wish. I would name it "AbusiveWordList" in the following code example.
After creating site solution/list definition, Add below code in Item Adding function, which will iterate through all column in the list and will check from the custom list that is created named "AbusiveWordList". This list contains abusive words.
The chkbody function which will reference list item from custom list named "AbusiveWordList" and check if the bodytext contains item from AbusiveWordList.If yes, then it will throw an error.
*base.ItemAdding(properties);
foreach (DictionaryEntry
dictionaryEntry in
properties.AfterProperties) { string
bodytext = "";
bodytext = bodytext +
dictionaryEntry.Value;
finalwordcount = finalwordcount +
chkbody(bodytext, properties); }
if (finalwordcount > 0) {
properties.ErrorMessage = "Abusive /
Foul / Illicit information
found.Kindly refer to the terms and
conditions.";
properties.Cancel = true;
}
You will probably need to override any controls that display text to avoid this issue. As this would be a lot of work, perhaps an HTTP Module would be a better solution.
I've worked on a module that used regular expressions to make SharePoint's output XHTML compliant. Similarly, you could use regular expressions to strip out offensive words when a page is rendered. It wouldn't stop people typing them but as no-one would be able to see them this wouldn't matter. You could use a basic SharePoint custom list to store the offensive words you don't want displayed.