Online ISSN: 2515-8260

Keywords : data Mining


Leema Raina.F, Dr.T. KamalaKannan

European Journal of Molecular & Clinical Medicine, 2022, Volume 9, Issue 7, Pages 9379-9386

Education Data Mining is a process of changing raw data into system education useful to information. It is used for educational software developing field, teachers, parents, students and other research in education. The Interest is increasing nowadays in educational field and data mining, data mining in education makes research growing community. This application mining provides a data to the systems in traditional, well-known learning content management systems, and particular courses in a web-based adaptive and intelligent systems.
Each systems have data to source differently and also knowledge in discovering object. After the pre-processing, data mining techniques are applied in visualization and Each systems have the data source differently and knowledge for objective discovering. After the pre-processing of data in each case, the data mining techniques are applied in statistical and visualization, clustering, classification and detection of outline, association rule mining and pattern mining and text mining. Many works and researchers need a specialized work in educational data mining becoming in a nature area. Weka tool and Rapid Miner used to predict the most accurate analysis.


Manogna vengala, Dr. CH N Santhosh Kumar

European Journal of Molecular & Clinical Medicine, 2022, Volume 9, Issue 7, Pages 5514-5521

This article provides a complete review on new perspectives and systematic interpretations of published literatures by painstakingly organising them into subcategories. [This article] [gives] [a] comprehensive overview on new perspectives and this is accomplished by supplying a list of the various pieces of published literature. This article discusses the fundamental ideas behind the numerous existing data mining methods that protect users' privacy, as well as the benefits and drawbacks associated with these technologies. The techniques that are currently available for protecting the privacy of users during data mining can be categorised according to various aspects. These aspects include things like distortion, association rule, hide association rule, taxonomy, clustering, associative classification, outsourced data mining, distributed, and k-anonymity. The most salient advantages and disadvantages of the procedures are outlined here within their respective categories. This in-depth investigation highlights the historical development, the research issues that are occurring at the present time, the future tendencies, the gaps, and the deficiencies. It has been decided that obligatory additional significant changes must be implemented for the purpose of providing stronger protection and preservation of personal privacy


M. Deepa; Dr.P. Sumitra

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 9, Pages 915-922

Data mining is now used by many institutions widely and generally. Intrusion detection
for network operators & security specialists is one of the top priorities and challenges.
Sensitive data, anonymity and device availability from attacks are protected by the Intrusion
detection system. In order to describe resources from those in the database through a network,
IDS uses data mining techniques. A robust algorithm must also be built to produce successful
rules for the detection of attacks. In this paper, optimization algorithms focused on
classification were used to detect attacks over the NSL KDD dataset. Depending on this
stranglehold, the current method is explained an improved Hoeffiding Induction Tree
algorithm to resolve the drawbacks. The results demonstrate that the proposed HITNB
algorithm has improved precision, a lower alarm rate and the ability to detect a new type


K. Priya; S. Ranjana; R. Manimegala

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 9, Pages 1486-1495
DOI: 10.31838/ejmcm.07.09.160

This paper discusses the concept as well as application of an adaptive networked ehealth program. In particular after an incident or an disaster, the program is targeted at avoiding manual data entry and improving the capacity to have beds in hospitals, especially at mass occasions where a huge amount of people gather in one location.The device design is focused on medical devices that utilize wireless network sensor (WSNs) to calculate patients' physical parameters.Such sensors move information from the patient's devices to the cloud setting through the wireless network. The e-health smart network provides medical personnel with real-time data processing, reduces manual data collection and allows the tracking of

Clustering Analysis from Universities in Indonesia based on Sentiment Analysis

Hendra Achmadi; Isana Meranga; Dewi Wuisan; Irwan Suarly; I Gusti Anom Yudistira; Rudy Pramono

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 10, Pages 1466-1481

There are two kind of source to determine the quality for a good university in Indonesia. First from university cluster which is publish from Ministry of Research, Technology and Higher Education issued a clustering list of Indonesian universities, the second source of data from social media, such as Twitter. In this research we use Text Mining and Data Mining Methodology to build a sentiment analysis from 50100 Tweet to assess 501 university using Python and special library in Python for Natural Language Processing a sentiment analysis , which is join the university clustering from Ministry of Research, Technology and Higher Education, so it will produce the positive, neutral and negative sentiment for each 501 universities in 2020. The next process by using R STUDIO, the process classification is continued by using K-Means, the process can be devided into two step , step 1 it will process 501 dataset university and it will build 5 cluster and secondly the similarities between Netizen cluster and cluster from Ministry of Research, Technology and Higher Education is 37 %, and step 2 after cleansing the 0 value, the result is 169 universites the similarities between Netizen cluster and cluster from Ministry of Research, Technology and Higher Education is 37 % before and after data cleansing was the same. The novelty knowledge or research finding can be derived from Netizen, firstly, the cluster can be derived based on Positive Sentiment,. Secondly, the cluster from Netizen and Cluster from Directorate General of Higher Education, Ministry of Education and Culture of higher education in Indonesia is only match around 37 % with cluster form Directorate General of Higher Education. And after data cleansing from 169 university was only match around 33 %..

Exploration Of A State Of The Art On Cardiac Diseases Prediction Techniques

S. Usha; Dr.S. Kanchana

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 7, Pages 6962-6967

Healthcare is a predictable task to wipe out human life. Coronary heart disease is sickness that impacts the human coronary heart. Cardiovascular sicknesses will forecast with the aid of several techniques that helped in making choices about the modifications that maintain excessive-risk patients which resulted in the discount of their dangers. The purpose of demise ratio of those sicknesses may be very high. It is very imperative to become aware of if the individual has heart disorder or now not. In medical field it is very important to find the occurrence of prediction of the heart diseases. Accurate Prediction results are very efficient to treat the patient’s medical history before the attack occurs. The techniques Data mining and Machine learning plays a essential role to predict the occurrence of heart diseases. These techniques diagnose these diseases with the help of dataset in healthcare centers. Various models used to reduce the number of deaths ratio. Models based on several algorithms such as Support Vector Machine (SVM), Decision Tree(DT), Naïve Bayes(NB), K-Nearest Neighbor(KNN), and Artificial Neural Network (ANN) are implemented to predict heart disease. The accuracy of these models helps to diagnose the diseases with better results. This paper summarized the performance of all algorithms which are used to predict and diagnose heart diseases.

Analysis of Finite State Automata For Sequence Mining

Omkar Singh; Anant Tiwari; Ashutosh Jaiswal; Roushan Kumar

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 6, Pages 3098-3108

Finite state automata (FSA) have a very understandable mathematical model; data can be compactly displayed using a final state automaton as well as automatically compile the system components. Sequence Pattern mining is useful for data mining &is important for a wide range of applications including the buyout sequence of consumers. Sequence Mining (SM) means the sequence patterns of the broad dataset are identified. It finds common substrings from a dataset as patterns. Most industries are interested in scan sequence patterns in their databases with massive data continuously collected. Data mining (DM) is one of the methods by which hidden patterns linked to instant sequences are recovered. We extract sequence models in sequence mining that are larger or equivalent to min support threshold value of supported patterns. In this paper, we discussed various types of automata types, automata in data mining, finite state automata, sequence mining, etc.


VVS SASANK; Praveen S R Konduri; Prasanna Kumar Prathipati

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 4, Pages 952-959

Most of the people requires genuine information about the online product. Before spending their economy on particular product can analyze the various reviews in the website. In this scenario, they did not identify whether it may be fake or genuine. In general, some reports in the websites are good, company technical people itself add these for making the product famous. These people belong to media and social organization teams, they give reviews with a good rating by their own firm. Online purchasers did not identify the fake product because of this falsification in the reviews of the website. In this research,the SVM classification mechanism has been used for detect the fake reviews by using IP address. This implementation helpful for users find out the correct review of online product. In this accuracy is improved by 98.79%, F1 score increases by 10%.

Development of Top K-Association Rule Mining for Discovering pattern in Medical Dataset

Aakriti Sharma; Anjana Sangwan; Blessy Thankchan; Sachin Jain; Veenita Singh; Shantanu Saurabh

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 4, Pages 1413-1421

Association rules consist of the discovery of association between mining transaction items. This is one of the most important information mining jobs. It has been integrated into many commercial data mining software and has a wide variety of applications on a number of domains. So, computing the prediction rules in top rank data set is very difficult task. Finding the pattern in large data set require memory computational power high rate of I/O. and it is possible only on high computational machine. In this paper, selection of parameter which is used to compute is chosen based on minimum support and minimum confidence value. In this paper proposed a new algorithm which generates the association rule for the input parameters to finding the pattern in large data set. The algorithm starts searching the rules. As soon as a rule is found, it is added to the list of order rules list by support. The list is used so far to maintain top N rules found. Once valid rules are found, the minimum support for the internal minsup variable list is raised to support the rule. When the Minsup value is raised, the search space is robbed while searching for more rules. Then, every time a valid rule is found, the list is inserted into the list, the lists that are not listed in the list are excluded from the list and the minsup is raised for the price of the least fun rules in the list. Result shows that new method is efficient technique to mine data set from standard data with minimum configuration system.

Driving Data Analysis Naturally for Road Safety

P.Sathya Siva Reddy; N. Deepa

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 3, Pages 2201-2210

Thousands of higher authoritydemonstrated that vehicle crashesand accidentsare caused by drivers unconsciousness. To guarantee thesecurity of people on the road network as far as feasible, it's vital to have the ability to predict the drivers'who aredriving safety risks safely in real time. The results of withstatistics published by the World health organisation. asshow that automobile crashes cause over 12 lakhsdeaths and more than 5croreinjurieshappenevery 365 daysand according to this data analysis from , it is found that young people between the ages of 14-28have the peakrangeof these crash events happeningwithhelp of database by using Hadoop tool we are able to test no restriction of data and simply add variety of machines into this bunch and also we receive results with less time, even eighth demanding put and miniatous price is less and we are using combines, partitions and bucketing methods in Hadoop.Hadoop is Opensource framework that includes modulated by the apache software base and It is used for storing and processing substantial data sets Hardware.We are employing spark we Can get consequence hundred times faste

Using Web Scraping In A Knowledge Environment To Build Ontologies Using Python And Scrapy

M. El Asikri; S. Krit; H. Chaib

European Journal of Molecular & Clinical Medicine, 2020, Volume 7, Issue 3, Pages 433-442

Web scraping, or web data extraction is data scraping used for extracting data from websites. Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.
In this paper, among others kind of scraping, we focus on those techniques that extract the content of a Web page. In particular, we adopt scraping techniques in the Web e-commerce field. To this end, we propose a solution aimed at analyzing data extraction to exploiting Web scraping using python and scrapy framework .

Text Mining Based on Tax Comments as Big Data Analysis Using XGBOOST and Feature Selection


European Journal of Molecular & Clinical Medicine, 2017, Volume 4, Issue 1, Pages 150-157

With the quick improvement of the Internet, enormous information has been applied in a lot of use.
Be that as it may, there are regularly excess or unessential highlights in high dimensional information, so
include determination is especially significant. By building subsets with new highlights and utilizing AI
calculations including Xgboost and so on. To acquire early notice data with high dependability and constant by
applying large information hypothesis, systems, models and techniques just as AI strategies are the unavoidable
patterns later on. this examination proposed the fast choice of highlights by utilizing XGboost model in dispersed
circumstances can improve the Model preparing proficiency under conveyed condition.
GBTs model dependent on the inclination streamlining choice tree was superior to the next two models as far as
precision and continuous execution, which meets the necessities under the large information foundation. It runs
on a solitary machine, just as the conveyed preparing structures Apache Hadoop, Apache Spark.
We can utilize inclination plummet for our slope boosting model. On account of a relapse tree, leaf hubs produce
a normal inclination among tests with comparative highlights. Highlight determination is a basic advance in
information preprocessing and significant research content in information mining and AI assignments, for
example, order.