Bryan Dyck bio photo

Bryan Dyck

Twitter Github

I tend to use GitHub’s star feature pretty liberally: I flag projects that look immediately useful, those that are interesting because of the language/ecosystem they’re part of and those that may be potentially useful down the road. Starring a project is a simple fire-and-forget operation whereas adding a bookmark to Pinboard is moderately distracting since you need to add a description, tag it, and so on. The downside is that my collection of GitHub stars lives in its own little silo, cut off from the ‘canonical’ collection of links that I have built up in Pinboard.

I had first thought of resolving this problem by using IFTTT but the triggering events supported by its GitHub channel are attached to major repository-level actions such as pull requests and issues, so that was out. Plan B was to write a script to do it myself. How hard can it be? As it happens, not all that difficult, though there were a few little corner cases to consider.

Run it

You can get the script on GitHub. While this information is also in the README, in order to run the script, you will need:

The last item is not strictly necessary as a) stars are part of your public profile and b) GitHub’s API can be used without authentication, but usage is limited to 60 requests/hour vs 5000 requests/hour for authenticated use; a token avoids issues with rate limiting. Finally, your GitHub username is used by the script as the HTTP user-agent string, as required by GitHub’s API.

You can either pass the API tokens as command line parameters or stick them in specially-named dotfiles in your home directory:

# Assuming you copy the token to the clipboard before running each command,
# pbpaste dumps the clipboard contents to stdout
$ pbpaste > ~/.github_api_token
$ pbpaste > ~/.pinboard_api_token

# Keep prying eyes out
$ chmod 600 ~/.*_api_token

Running with dotfiles:

$ python pin-github-stars.py -u GITHUB_USERNAME

Running without dotfiles:

$ python pin-github-stars.py -g GITHUB_TOKEN -p PINBOARD_TOKEN -u GITHUB_USERNAME

Implementation details

Requests

I love Requests. It takes its motto of “HTTP for humans” seriously, and it’s one of the first tools I reach for when I start doing anything HTTP-related in Python: it’s both that good and the standard libary’s mess of options for doing the same are, comparatively, that painful. It’s so good that—for cases like this where I only need to use a subset of a service’s REST API—I typically don’t bother with the service-specific library. (It also doesn’t hurt that GitHub’s API in particular is among the better ones you may come across.)

While adding external dependencies is rarely without cost, Requests makes a strong argument for itself in that the resulting code is extremely concise and as a library, it’s largely self-contained.

Generators

This is perhaps a bit indulgent but I’ve found that practical examples of Python generator usage are sometimes hard to find. Since GitHub’s API makes extensive use of paging for calls that may return large numbers of results, it matches up well with the “on demand” nature of a generator. I’ve slightly modified a chunk of the script here:

 1 def get_github_stars(headers, sort_dir):
 2     page = 1
 3     last = 1
 4     params = {'direction': sort_dir}
 5     while page <= last:
 6         params['page'] = page
 7         r = requests.get('https://api.github.com/user/starred',
 8                          params=params,
 9                          headers=headers)
10         yield r.json()
11         if 'link' in r.headers:
12             m = re.search(r'page=(\d+)>; rel="next",.*page=(\d+)>; rel="last"',
13                           r.headers['link'])
14             # Lazy way to see if we hit the last page, which only has 'first' and 'prev' links
15             if m is None:
16                 break
17             page, last = m.groups()
18 
19 # Simple usage of the generator:
20 for results_page in get_github_stars(my_header_dict, 'desc'):
21     # Do the thing with the stuff

When execution reaches the yield statement, a page of results is returned to the for loop and execution of the generator is suspended (though its state is preserved). When the for loop finishes iterating over the contents of results_page, execution returns to the generator and picks up after the yield statement, where the function uses the link header to determine URLs for subsequent pages of results. The while loop continues, exiting once the final page URL is followed.

It’s important to keep in mind that a generator function differs from a normal function: calling a generator function directly returns a generator object; it does not start running the generator. To do that, you’d call the generator object’s next() method—in a for loop, that’s handled automatically.

If you’d like to dig deeper into the world of generators, this David Beazley presentation is a good place to start.