Label: Alma-Electronica - AE 025 • Format: 2x, File MP3, EP 320 kbps • Country: Argentina • Genre: Electronic • Style: Minimal Techno
Formerly, I was researcher with the I. Below some news and notes on my research activities in the areas of data mining and information retrieval. You can follow me on Twitter. Web search engines use very complex models in order to estimate the relevance of a document w. These models are made of thousands regression trees, and their evaluation is computationally expensive. This is achieved thanks to a novel data layout that provides better cache locality, and that transforms the forest traversal in to fast bit-wise operations.
Additional improvements were achieved by vectorizing the algorithm: by exploiting SIMD instructions it is possible to evaluate multiple documents simultaneously [ 6 ]. We also developed novel algorithm aimed at improving the quality of ranking models. These results are the outcome of a quality vs. RankEval aims at providing a common ground for several Learning to Rank libraries by providing useful and Various - Jazz In Switzerland 1930-1975 (Swiss Jazz) tools for a comprehensive comparison and in-depth analysis of ranking models.
More Info. Fast ranking with additive ensembles of oblivious and non-oblivious regression trees. Rankeval: An evaluation and analysis framework for learning-to-rank solutions. Quickscorer: a fast algorithm to rank documents with additive ensembles of regression trees. Exploiting cpu simd extensions to speed-up document scoring with tree ensembles. Adversarial Machine Learning. To date, research on adversarial ML has mostly focused on deep neural networks.
Adversarial training of gradient-boosted decision trees. Frequent Pattern Mining. Frequent Pattern mining is one of the most typical data mining tasks. Frequent pattern mining is computationally expensive. A very interesting result was presented at ACM SIGKDD [ 1 ] : we showed that it is possible to extract relevant pattern with a single scan of the dataset by exploiting smart sampling strategies, thus allowing the mining of very large datasets.
We also developed out-of-core [ 9 ]multi-core [ 10 ]and distributed pattern mining algorithms [ 7 ]. In fact, it supports both out-of-core and multi-core mining.
It extracts all the possible association rules, without assuming that given a frequent itemset the supports of its subsets is known. The input format is the usual ascii format: 1 2 3 Conquest: a constraint-based querying system for exploratory pattern discovery. A constraint-based querying system for exploratory pattern discovery. Information Systems34 1 :3—27, On closed constrained frequent pattern mining.
On condensed representations of constrained frequent patterns. Knowledge and Information Systems9 2 —, Extending the state-of-the-art of constraint-based pattern discovery.
Data and Knowledge Engineering60 2 —, Mining home: Towards a public resource computing framework for distributed data mining. Concurrency and Computation: Practice Paralel - Claudio Coccia - Mind Patterns Experience22 5 —, Mining frequent closed itemsets out of Paralel - Claudio Coccia - Mind Patterns. Parallel mining of frequent closed patterns: Harnessing modern computer architectures. A unifying framework for mining approximate top-k binary patterns. Entity Linking for Document Understanding.
Entity Linking is the task of identifying named entities places, people, concepts, etc. Indeed, as entity Paralel - Claudio Coccia - Mind Patterns is the building block of most approaches, Kinda Dukish - Duke Ellington - The Art Of Duke Ellington / The Great Paris Concert showed how to improve the accuracy of several state-of-the-art algorithms.
Finally, we developed a new Entity Linking method [ 4 ]named SELaiming at estimating the relevance of the mentioned entities and to improve the accuracy in the detection of the most relevant ones, with a 2x F-measure improvement Paralel - Claudio Coccia - Mind Patterns. Learning relatedness measures for entity linking. Dexter 2. Semantic Web Conferencepages —, Manual annotation of semi-structured documents for entity-linking.
Conference on Information and Knowledge Managementpages —, Large-scale Data Mining. We discuss two major results. In [ 2 ] we developed a new algorithm, named C r a c k e rfor the detection of connected components in very large graphs.
The proposed algorithm, implemented over Spark, is able to process billions nodes and hundreds billions arcs, with a speed-up factor in the rage 1. In [ 1 ] we tackled the self-join similarity in large document collections.
The proposed MapReduce algorithm exhibits a speed-up w. In both works, we adopted a common approach based on the pruning non informative data in order to reduce the computational cost of the algorithms. Document similarity self-join with mapreduce. Fast connected components computation in large graphs by vertex pruning.
Web Mining. Finally, we developed a novel personalized news recommendation algorithm that exploits social network data by analyzing the tweets of a user and his social circles to recommend the most items published by news agencies being most relevant for the given user [ 3 ].
Identifying task-based sessions in search engine query logs. Discovering tasks from search engine query logs. ACM Trans. ACM Notable Article. From chatter to headlines: Harnessing the real-time web for personalized news recommendation. Claudio Lucchese. Paralel - Claudio Coccia - Mind Patterns on. Claudio Lucchese Some rights reserved.