Track website browsing time using python.

We, as programmers, have to constantly be in front of our computers. A big part of our work life is spent online. Whether it be searching for solutions on stackoverflow, or keeping up with coworkers through slack, reading documentation etc. But internet comes with a lot of distractions. At the same time, we may end up with multiple tabs of Stackoverflow, Instagram, gmail, python docs, YouTube etc. We might not feel it, but this indeed wastes a lot of time. Thus it becomes necessary to track your website browsing time so that you have an idea of how productive your day is going.

I will teach you how to build a python script which can track what websites you are browsing, and how much time you spend on them.

How will we do this?

This project will have two components,

  1. A chrome extension which will monitor the current website you are viewing.
  2. A flask server which will use this data to calculate time spent on a particular website.

The chrome extension will have a background script which will monitor any change in the current active tab or URL. Whenever any change occurs, it will send the new current active URL to the flask server using an HTTP request. Meanwhile, the flask server will calculate the time passed since the previous request and the new request. This will be the time spent on previous URL.

Store the timestamps of each new URL whenever it is opened, Also update the timestamp when a URL is re-opened.
subtract the timestamps of current URL and previous URL to get the time spent on previous URL.

First we will make a chrome extension which will send an HTTP request to flask server as soon as the active URL changes.

Don’t worry, making such chrome extension is not hard. You just need basic JavaScript knowledge to do so. I will guide you through the steps:

Making the chrome extension.

Note that we do not need to make a front end for our extension as our extension serves only a simple purpose. We will just write a background script which will run in the background of the browser while the extension is active. Here is the file structure of our extension:

As shown, our extension has 3 files:

  • background.js: This file will contain the code which will run in the background of browser.
  • icon.png: our extension icon file. You can download any icon file in PNG format from the internet and use it.
  • manifest.json: This file contains basic configurations related to our extension.

First of all, let us start coding our background.js file.

The work of this file is simple. Just watch whenever a person switches between tabs or opens a new tab. As soon as this happens, send a HTTP request to server with the new current URL as the data. In order to detect when a new tab is opened, chrome API has an inbuilt listener which is invoked every time


chrome.tabs.onActivated.addListener(function (activeInfo) {
    chrome.tabs.get(activeInfo.tabId, function (tab) {
        y = tab.url;
        var xhttp = new XMLHttpRequest();
        xhttp.onreadystatechange = function () {
            if (this.readyState == 4 && this.status == 200) {
                console.log(this.responseText);
            }
        };
        xhttp.open("POST", "http://127.0.0.1:5000/send_url");
        xhttp.send("url=" + y);

    });
});

In above code, onActivated.addListener() is invoked whenever a new tab is activated in the browser window. Inside this function, we can access the Id of this tab using tabId and URL using tab.url. Then we send an HTTP request using XMLHttpRequest() to /send_url endpoint with URL as the data.

Now we have to create a listener whenever the user switches between tabs. We use onUpdated.addListener() for this. This function invokes whenever tabs are updated.


chrome.tabs.onUpdated.addListener((tabId, change, tab) => {
    if (tab.active && change.url) {

        var xhttp = new XMLHttpRequest();
        xhttp.onreadystatechange = function () {
            if (this.readyState == 4 && this.status == 200) {
                console.log(this.responseText);
            }
        };
        xhttp.open("POST", "http://127.0.0.1:5000/send_url");
        xhttp.send("url=" + change.url);

    }
});

In onUpdated.addListener() , change object can be used to access the new URL. We again send this changed URL to server using XMLHttpRequest() .

Now we have to also detect whenever a tab is closed, so that we can update its timestamp and viewtime in the server code. For this, we use a combination of onUpdated() and onRemoved() listeners. This is done because whenever a tab is removed, it is also considered as an update in chrome. Also we cannot access the removed tab URL from onRemoved() function, so we have to use it in combination with onUpdated().

// define a mapping between tabId and url:
var tabToUrl = {};
chrome.tabs.onUpdated.addListener(function (tabId, changeInfo, tab) {
//store tabId and tab url as key value pair: tabToUrl[tabId] = tab.url; }); chrome.tabs.onRemoved.addListener(function (tabId, removeInfo) {
//since tab is not available inside onRemoved,
//we have to use the mapping we created above to get the removed tab url: console.log(tabToUrl[tabId]); var xhttp2 = new XMLHttpRequest(); xhttp2.onreadystatechange = function () { if (this.readyState == 4 && this.status == 200) { console.log(this.responseText); } }; xhttp2.open("POST", "http://127.0.0.1:5000/quit_url"); xhttp2.send("url=" + tabToUrl[tabId]); // Remove information for non-existent tab delete tabToUrl[tabId]; });

With this, we have completed background.js code. Now lets create manifest.json and add the following code to it:


{
	"manifest_version": 2,
	"name": "Currenturl",
	"description": "Fetches current tab url.",
	"version": "0.1",
	"author": "Tarun Khare",
	"browser_action": {
		"default_icon": "icon.png",
		"default_title": "Just observing your current url."
	},
	"permissions": ["tabs", "activeTab", "http://127.0.0.1:5000/*", "storage"],
	"background": {
		"scripts": ["background.js"],
		"persistent": false
	}
}

In this file, we define things like extension name, version, icon file etc. We have to allow some permissions to our extension. So in “permissions” we define “tabs”, “activeTab”, storage and the server URL, which is 127.0.0.1:5000 in my case. In “background”, we define our background script name i.e “background.js”.

Now our chrome extension is complete. We can start writing our python script. In our python code, first we will import the required flask modules and define some global variables.


from flask import Flask, jsonify, request
import time

app = Flask(__name__)
url_timestamp = {}
url_viewtime = {}
prev_url = ""

In the above code, we will use url_timestamp to store the latest unix timestamp when a URL became active. For example, if “google.com” becomes active, then our extension will send a request to our server with “google.com” as data. As soon as this request reaches the server, we will find the unix time at that particular moment, and store this timestamp as a value to url_timestamp["google.com"].

Similarly, url_viewtime will store the total time a URL has been viewed in seconds.

Since URLs are often long, So let us create a function which will strip URLs to their parent only. For example, “https:twitter.com/codeharvestio” should be shortened to “www.twitter.com”.


def url_strip(url):
    if "http://" in url or "https://" in url:
        url = url.replace("https://", '').replace("http://", '') 
            .replace('\"', '')
    if "/" in url:
        url = url.split('/', 1)[0]
    return url

Now we will write the function for ‘send_url/’ endpoint. In this function, whenever we will recieve a new URL from our extension, we will find the current unix timestamp using time.time() function. Them we will find the timestamp of the previous URL from the url_timestamp dictionary. On subtracting these two timestamps, we will get the time spent on the previous url. We will then add on this time to the total viewtime of previous URL. Here is the code for ‘send_url/’ endpoint.


@app.route('/send_url', methods=['POST'])
def send_url():
    resp_json = request.get_data()
    params = resp_json.decode()
    url = params.replace("url=", "")
    print("currently viewing: " + url_strip(url))
    parent_url = url_strip(url)

    global url_timestamp
    global url_viewtime
    global prev_url

    print("initial db prev tab: ", prev_url)
    print("initial db timestamp: ", url_timestamp)
    print("initial db viewtime: ", url_viewtime)

    if parent_url not in url_timestamp.keys():
        url_viewtime[parent_url] = 0

    if prev_url != '':
        time_spent = int(time.time() - url_timestamp[prev_url])
        url_viewtime[prev_url] = url_viewtime[prev_url] + time_spent

    x = int(time.time())
    url_timestamp[parent_url] = x
    prev_url = parent_url
    print("final timestamps: ", url_timestamp)
    print("final viewtimes: ", url_viewtime)

    return jsonify({'message': 'success!'}), 200

For ‘quit_url/’ endpoint, we don’t have to write much code. The reason is that, as soon as a tab is closed in chrome, it is also considered as an update, so onUpdated.addListener() function in the second code snippet automatically runs. Thus running the send_url() function in our python script. This takes care of everything. But we will create this function just for printing purposes.


@app.route('/quit_url', methods=['POST'])
def quit_url():
    resp_json = request.get_data()
    print("Url closed: " + resp_json.decode())
    return jsonify({'message': 'quit success!'}), 200

Finally, add the following line to complete the server code:

app.run(host='0.0.0.0', port=5000)

Run your python code. Then open chrome. Click on three dot symbol at top right corner, then ‘more tools’, then ‘extensions’. Turn on the developer mode from top right corner of this page.

Now click on ‘Load unpacked’ to load your extension. Select the folder where your three extension files are present. (Please don’t keep your python file in this folder to avoid problems).

Now as soon as your extension starts running, you can open new tabs, or switch between open tabs. Spend some time on them. Now Open your running python server. You will see the URL time being printed in python console as shown below.

As you will switch between tabs or open new tab or URL, the updated outputs will re-printed on console.

That’s It. Hope you liked the article. Comment in case of any doubts. Also check out my articles on medium and follow me on twitter. Also check out my other articles on this blog as well!

One Reply to “Track website browsing time using python.”

Leave a Reply

Your email address will not be published. Required fields are marked *