Online ISSN: 2515-8260

Keywords : Data Mining

Clustering Analysis from Universities in Indonesia based on Sentiment Analysis

Hendra Achmadi; Isana Meranga; Dewi Wuisan; Irwan Suarly; I Gusti Anom Yudistira; Rudy Pramono

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 10, Pages 1466-1481

There are two kind of source to determine the quality for a good university in Indonesia. First from university cluster which is publish from Ministry of Research, Technology and Higher Education issued a clustering list of Indonesian universities, the second source of data from social media, such as Twitter. In this research we use Text Mining and Data Mining Methodology to build a sentiment analysis from 50100 Tweet to assess 501 university using Python and special library in Python for Natural Language Processing a sentiment analysis , which is join the university clustering from Ministry of Research, Technology and Higher Education, so it will produce the positive, neutral and negative sentiment for each 501 universities in 2020. The next process by using R STUDIO, the process classification is continued by using K-Means, the process can be devided into two step , step 1 it will process 501 dataset university and it will build 5 cluster and secondly the similarities between Netizen cluster and cluster from Ministry of Research, Technology and Higher Education is 37 %, and step 2 after cleansing the 0 value, the result is 169 universites the similarities between Netizen cluster and cluster from Ministry of Research, Technology and Higher Education is 37 % before and after data cleansing was the same. The novelty knowledge or research finding can be derived from Netizen, firstly, the cluster can be derived based on Positive Sentiment,. Secondly, the cluster from Netizen and Cluster from Directorate General of Higher Education, Ministry of Education and Culture of higher education in Indonesia is only match around 37 % with cluster form Directorate General of Higher Education. And after data cleansing from 169 university was only match around 33 %..


M. Deepa; Dr.P. Sumitra

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 9, Pages 915-922

Data mining is now used by many institutions widely and generally. Intrusion detection
for network operators & security specialists is one of the top priorities and challenges.
Sensitive data, anonymity and device availability from attacks are protected by the Intrusion
detection system. In order to describe resources from those in the database through a network,
IDS uses data mining techniques. A robust algorithm must also be built to produce successful
rules for the detection of attacks. In this paper, optimization algorithms focused on
classification were used to detect attacks over the NSL KDD dataset. Depending on this
stranglehold, the current method is explained an improved Hoeffiding Induction Tree
algorithm to resolve the drawbacks. The results demonstrate that the proposed HITNB
algorithm has improved precision, a lower alarm rate and the ability to detect a new type


K. Priya; S. Ranjana; R. Manimegala

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 9, Pages 1486-1495
DOI: 10.31838/ejmcm.07.09.160

This paper discusses the concept as well as application of an adaptive networked ehealth program. In particular after an incident or an disaster, the program is targeted at avoiding manual data entry and improving the capacity to have beds in hospitals, especially at mass occasions where a huge amount of people gather in one location.The device design is focused on medical devices that utilize wireless network sensor (WSNs) to calculate patients' physical parameters.Such sensors move information from the patient's devices to the cloud setting through the wireless network. The e-health smart network provides medical personnel with real-time data processing, reduces manual data collection and allows the tracking of

Driving Data Analysis Naturally for Road Safety

P.Sathya Siva Reddy; N. Deepa

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 3, Pages 2201-2210

Thousands of higher authoritydemonstrated that vehicle crashesand accidentsare caused by drivers unconsciousness. To guarantee thesecurity of people on the road network as far as feasible, it's vital to have the ability to predict the drivers'who aredriving safety risks safely in real time. The results of withstatistics published by the World health organisation. asshow that automobile crashes cause over 12 lakhsdeaths and more than 5croreinjurieshappenevery 365 daysand according to this data analysis from , it is found that young people between the ages of 14-28have the peakrangeof these crash events happeningwithhelp of database by using Hadoop tool we are able to test no restriction of data and simply add variety of machines into this bunch and also we receive results with less time, even eighth demanding put and miniatous price is less and we are using combines, partitions and bucketing methods in Hadoop.Hadoop is Opensource framework that includes modulated by the apache software base and It is used for storing and processing substantial data sets Hardware.We are employing spark we Can get consequence hundred times faste

Exploration Of A State Of The Art On Cardiac Diseases Prediction Techniques

S. Usha; Dr.S. Kanchana

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 7, Pages 6962-6967

Healthcare is a predictable task to wipe out human life. Coronary heart disease is sickness that impacts the human coronary heart. Cardiovascular sicknesses will forecast with the aid of several techniques that helped in making choices about the modifications that maintain excessive-risk patients which resulted in the discount of their dangers. The purpose of demise ratio of those sicknesses may be very high. It is very imperative to become aware of if the individual has heart disorder or now not. In medical field it is very important to find the occurrence of prediction of the heart diseases. Accurate Prediction results are very efficient to treat the patient’s medical history before the attack occurs. The techniques Data mining and Machine learning plays a essential role to predict the occurrence of heart diseases. These techniques diagnose these diseases with the help of dataset in healthcare centers. Various models used to reduce the number of deaths ratio. Models based on several algorithms such as Support Vector Machine (SVM), Decision Tree(DT), Naïve Bayes(NB), K-Nearest Neighbor(KNN), and Artificial Neural Network (ANN) are implemented to predict heart disease. The accuracy of these models helps to diagnose the diseases with better results. This paper summarized the performance of all algorithms which are used to predict and diagnose heart diseases.

Using Web Scraping In A Knowledge Environment To Build Ontologies Using Python And Scrapy

M. El Asikri; S. Krit; H. Chaib

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 3, Pages 433-442

Web scraping, or web data extraction is data scraping used for extracting data from websites. Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.
In this paper, among others kind of scraping, we focus on those techniques that extract the content of a Web page. In particular, we adopt scraping techniques in the Web e-commerce field. To this end, we propose a solution aimed at analyzing data extraction to exploiting Web scraping using python and scrapy framework .

Analysis of Finite State Automata For Sequence Mining

Omkar Singh; Anant Tiwari; Ashutosh Jaiswal; Roushan Kumar

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 6, Pages 3098-3108

Finite state automata (FSA) have a very understandable mathematical model; data can be compactly displayed using a final state automaton as well as automatically compile the system components. Sequence Pattern mining is useful for data mining &is important for a wide range of applications including the buyout sequence of consumers. Sequence Mining (SM) means the sequence patterns of the broad dataset are identified. It finds common substrings from a dataset as patterns. Most industries are interested in scan sequence patterns in their databases with massive data continuously collected. Data mining (DM) is one of the methods by which hidden patterns linked to instant sequences are recovered. We extract sequence models in sequence mining that are larger or equivalent to min support threshold value of supported patterns. In this paper, we discussed various types of automata types, automata in data mining, finite state automata, sequence mining, etc.


VVS SASANK; Praveen S R Konduri; Prasanna Kumar Prathipati

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 4, Pages 952-959

Most of the people requires genuine information about the online product. Before spending their economy on particular product can analyze the various reviews in the website. In this scenario, they did not identify whether it may be fake or genuine. In general, some reports in the websites are good, company technical people itself add these for making the product famous. These people belong to media and social organization teams, they give reviews with a good rating by their own firm. Online purchasers did not identify the fake product because of this falsification in the reviews of the website. In this research,the SVM classification mechanism has been used for detect the fake reviews by using IP address. This implementation helpful for users find out the correct review of online product. In this accuracy is improved by 98.79%, F1 score increases by 10%.

Development of Top K-Association Rule Mining for Discovering pattern in Medical Dataset

Aakriti Sharma; Anjana Sangwan; Blessy Thankchan; Sachin Jain; Veenita Singh; Shantanu Saurabh

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 4, Pages 1413-1421

Association rules consist of the discovery of association between mining transaction items. This is one of the most important information mining jobs. It has been integrated into many commercial data mining software and has a wide variety of applications on a number of domains. So, computing the prediction rules in top rank data set is very difficult task. Finding the pattern in large data set require memory computational power high rate of I/O. and it is possible only on high computational machine. In this paper, selection of parameter which is used to compute is chosen based on minimum support and minimum confidence value. In this paper proposed a new algorithm which generates the association rule for the input parameters to finding the pattern in large data set. The algorithm starts searching the rules. As soon as a rule is found, it is added to the list of order rules list by support. The list is used so far to maintain top N rules found. Once valid rules are found, the minimum support for the internal minsup variable list is raised to support the rule. When the Minsup value is raised, the search space is robbed while searching for more rules. Then, every time a valid rule is found, the list is inserted into the list, the lists that are not listed in the list are excluded from the list and the minsup is raised for the price of the least fun rules in the list. Result shows that new method is efficient technique to mine data set from standard data with minimum configuration system.