Document Type : Research Article
Diabetes mellitus is one of the major non-communicable diseases which have great impact on human life today. A huge amount of data is generated including a wide variety of the Electronic Medical Record (EMR), pharmacy reports, and laboratory reports, among other data related to patients. Big data analytics can be applied to this data to generate useful patterns and relation between different factors which affects diabetes. The results obtained from this analysis shows relation between different attributes which can be used to improve healthcare system. In this paper the analysis of the diabetes dataset is done using Hadoop framework, which is a distributive framework and can be used to analysis large amount of data. The dataset is taken from PIMA Indian Database, which includes different factors that affect diabetes like age, blood pressure, BMI (Body-Mass Index), skin thickness etc. Results produced by the analysis of data are projects on Power BI.