Abstract:
Community structure is one of the main structural features of networks and
detecting overlapped community structure is an important field in social network
analysis. There are many methods for finding non-overlapping communities in this
research area. The existing studies about overlapping do not sufficiently address the
problems of the relationship between objects in overlapping regions and the roles of
these objects during the formation and growth of communities. In recent years, local
community detection algorithms which detect overlapped community structure have
been developed. Local expansion methodologies that detect local community structure
are techniques to find a community through the seed. Therefore, recent algorithms have
emphasized on the locating seed rather than random seed selection. However, although
the most existing algorithms could identify superior seed, their expansion strategies did
not become effective and efficient strategies. Moreover, algorithms suffer unstable
community structure because the influences of parameter for controlling community’s
resolution of fitness evaluation functions where used in community expansion process.
In this research, therefore, the algorithm is modelled on local expansion strategy and
designs the extended jaccrad similarity to find seed. In addition, this research
formulates the optimized parameter evaluation formula to avoid the parameter
influences. This work, firstly, identifies the seed or core node by using extended jaccard
similarity and form initial community via seed. Then local community is detected by
expanding the initial community with fitness function based on proposed optimized
parameter evaluation and finally overlapped nodes are identified by merging detected
local communities. In this dissertation, the algorithm is implemented by using small
datasets from network data repository site and large networks from Stanford large
network datasets collection. In addition to real networks, overlapping artificial
benchmarks are also selected to generate the experiment networks. On both real and
artificial, the performance results of proposed algorithm are compared with state of the
art algorithms by using various performance evaluation metrics. In particular, the
proposed algorithm is proven that it has better accuracy on both real and benchmarks
and saves running time as an efficient algorithm.