Hi all, as part of a corpus study I’m interested in using clustering techniques to group observations. Thanks to another discussion here I came across VNC as a possible technique for this.
I’m hoping to use @folgert’s code for this, but having an issue using it on my version of Python (3.7.4). Installed fine, but when running the VNC notebook I get this issue in the fifth cell:
~/Downloads/diachronic-text-analysis-master/HACluster/dendrogram.py in ete_tree(self, labels)
140 from ete2 import Tree, NodeStyle, TreeStyle
141 elif sys.version_info[0] == 3:
--> 142 from ete3 import Tree, NodeStyle, TreeStyle
143 else:
144 raise ValueError('Your version of Python is not supported.')
ModuleNotFoundError: No module named 'ete3'
Wondering:
a) If anyone has any thoughts on this problem (what version of python was this written in?)
b) Knows of any related code that does similar stuff (excepting @mike.kestemont’s code for his Beckett project)?
c) More broadly whether anyone has particular opinions on this topic of style-based clustering? Obviously developing chronologically contiguous clusters is helpful on one level but hardly exhaustive, and I wonder what techniques others have used? I’ve employed some basic K-Means (although this often ends up producing chronologically contiguous clusters if corpus position is a variable) but not much beyond that.
Cheers
Josh