Variability-Based Neighbour Clustering

joshua.ballance · May 27, 2020, 7:59pm

Hi all, as part of a corpus study I’m interested in using clustering techniques to group observations. Thanks to another discussion here I came across VNC as a possible technique for this.

I’m hoping to use @folgert’s code for this, but having an issue using it on my version of Python (3.7.4). Installed fine, but when running the VNC notebook I get this issue in the fifth cell:

    ~/Downloads/diachronic-text-analysis-master/HACluster/dendrogram.py in ete_tree(self, labels)
    140             from ete2 import Tree, NodeStyle, TreeStyle
    141         elif sys.version_info[0] == 3:
--> 142             from ete3 import Tree, NodeStyle, TreeStyle
    143         else:
    144             raise ValueError('Your version of Python is not supported.')

ModuleNotFoundError: No module named 'ete3'

Wondering:
a) If anyone has any thoughts on this problem (what version of python was this written in?)
b) Knows of any related code that does similar stuff (excepting @mike.kestemont’s code for his Beckett project)?
c) More broadly whether anyone has particular opinions on this topic of style-based clustering? Obviously developing chronologically contiguous clusters is helpful on one level but hardly exhaustive, and I wonder what techniques others have used? I’ve employed some basic K-Means (although this often ends up producing chronologically contiguous clusters if corpus position is a variable) but not much beyond that.

Cheers

Josh

folgert · May 27, 2020, 8:07pm

Hi! Python 3.7 should probably work, but you never know… The error here seems to be a missing package, the etetoolkit (http://etetoolkit.org/), which is probably missing from the requirements in the setup.py script. With pip you can install it using:

pip install ete3

I’m curious what other people have to say about b) and c)!

folgert · May 28, 2020, 7:48am

Ah, of course there’s also the original R code written by Stefan Gries and Martin Hilpert: https://global.oup.com/us/companion.websites/fdscontent/uscompanion/us/static/companion.websites/nevalainen/Gries-Hilpert_web_final/vnc.individual.html

joshua.ballance · May 28, 2020, 7:26pm

Great, combined with the updated code that’s working well for me now. Thanks!