Nonalcoholic fatty liver disease (NAFLD) is the most common chronic liver disease whose prevalence has reached global epidemic proportions. Although the disease is relatively benign in the early stages, when severe clinical forms, including nonalcoholic steatohepatitis (NASH), cirrhosis and even hepatocellular carcinoma, occur, they result in worsening the long-term prognosis. A growing body of evidence indicates that NAFLD develops from a complex process in which many factors, including genetic susceptibility and environmental insults, are involved. In this review, we focused on the genetic component of NAFLD, with special emphasis on the role of genetics in the disease pathogenesis and natural history. Insights into the topic of the genetic susceptibility in lean individuals with NAFLD and the potential use of genetic tests in identifying individuals at risk are also discussed.
Nonalcoholic fatty liver disease (NAFLD), whose prevalence has reached global epidemic proportions, is the most common chronic liver disease . Although the disease is relatively benign in the early stages, when severe clinical forms such as nonalcoholic steatohepatitis (NASH), cirrhosis and even hepatocellular carcinoma (HCC) occur, the long-term prognosis worsens . The most dramatic event in the natural history of the disease is the incidence of NAFLD-related HCC . Hence, knowledge of the disease pathogenesis and predisposing factors is crucial for understanding the disease biology and making decisions on diagnostic or therapeutic interventions, the latter being the main goal of Precision Medicine.
A growing body of evidence indicates that the disease develops as a result of a complex process in which many factors, including genetic susceptibility and environmental insults, are involved [2,4]. Furthermore, NAFLD severity and progression is modulated by epigenetic factors, including liver-specific DNA-methylation changes and microRNAs that significantly modulate the liver transcriptome [5-11].
In this review, we focused on the genetic component of NAFLD, with special emphasis on the role of genetics in the disease pathogenesis and natural history. Insights into the topic of the genetic susceptibility in lean individuals with NAFLD and the potential use of genetic tests in identifying individuals at risk are also discussed.
NAFLD IS A POLYGENIC AND HERITABLE DISEASE
NAFLD is a heritable and complex trait, the genetic component of which has been explored by a myriad of studies that used different approaches illustrated in Figure 1.
Robust evidence from population-based and familial-aggregation studies, as well as twin-studies, has provided in-depth knowledge on NAFLD or NAFLD-related outcomes. According to the available data, the heritability estimates range from 20 to 70%, depending on the study design, ethnicity and the methodology used—including imaging technology—to characterize the phenotype (Fig. 2). While some imaging technologies employed are highly affordable and easy to implement in clinical setting, others are highly sensitive and specific but expensive. Among those, liver ultrasound (US), abdominal computed tomography (CT) and magnetic resonance spectroscopy (MRS) are most widely utilized for measuring either qualitatively or quantitatively the amount of liver fat infiltration.
Struden and colleagues provided the first evidence on the heritability of NAFLD by examining familial forms of cryptogenic cirrhosis ; the study included 18 members of eight kindreds containing two or more afflicted members. The authors observed that NASH coexisted within four kindreds, whereby the afflicted patients formed mother-daughter, sister-sister, sister-brother, fatherdaughter, and male-female cousin dyads .
More recent reports based on large populations have yielded more precise estimations of the magnitude of NAFLD heritability. For example, Speliotes and investigators from the collaborative GIANT, MAGIC and GOLD population-based consortia reported the heritability estimates of hepatic steatosis as 26−27% in a large (n=6,629) sample of subjects of European descent . Wagenknecht and coworkers from the IRAS-Insulin Resistance Atherosclerosis Study Family Study reported the heritability of NAFLD in 795 Hispanic American and 347 African-American adults and found similar figures of about ~31% . An interesting aspect of the Wagenknecht et al.’s study is the large disparity in the NAFLD heritability between cohorts of different ethnicities, which was greater in the Hispanic cohort (33%) compared with the African American cohort (14%) .
Palmer and investigators from the Jackson Heart Study (JHS), ARIC (Atherosclerosis Risk in Communities Study), the Insulin Resistance Atherosclerosis Family Study (IRASFS), Genetic Epidemiology Network of Arteriopathy (GENOA), and Family Heart Study (FamHS), which included five African American (n=3,124) and one Hispanic American (n=849) cohort, also reported differences in the heritability estimates between African (20%) and HispanicAmerican (34%) families .
Schwimmer et al. performed a familial aggregation study of overweight children with biopsy-proven NAFLD and family members in whom the phenotype was assessed by MRS, and observed that fatty liver was significantly more common in siblings (59%) and parents (78%) of children with NAFLD as compared with obese children without family history of the disease . Recently, results from The Genetics of NAFLD in Twins Consortium illustrated the heritability of hepatic steatosis (based on MRI-proton-density fat fraction) and fibrosis (based on stiffness measured by magnetic resonance elastography), which was estimated at ~50% . Cui et al. from the same group also found in the same cohort of twins a high level (~75.6%) of shared genetic effect between steatosis and fibrosis. However, the authors surprisingly reported almost absent environmental effects . In contrast, results pertaining to another twin cohort that included 208 adult Hungarian twins (63 monozygotic and 41 dizygotic pairs), while providing no support for the heritability of NAFLD, suggest a large shared and unshared environmental effect (74.2% and 25.8%, respectively) . The heritability of NAFLD and its association with abnormal vascular parameters was also assessed in this study , showing that NAFLD is concomitant with carotid plaques or carotid intima media thickness. The association of NAFLD and carotid plaques was previously proven by a meta-analysis of 3,497 subjects (1,427 patients and 2,070 controls) . Moreover, genetic co-variance assessment of metabolic syndrome (MetS)-associated traits in The Genetics of NAFLD in Twins Consortium revealed a significant association between hepatic steatosis and body mass index (BMI) and hyperinsulinemia, and between hepatic fibrosis and glycated haemoglobin (HbA1c) . It is noteworthy that the available data mining strategies and systems biology approaches strongly suggest genetic commonality between NAFLD and MetS .
GENETIC FACTORS THAT INFLUENCE THE SEVERITY AND NATURAL HISTORY OF NAFLD
The role of genetic variation in NAFLD, specifically single nucleotide polymorphisms (SNPs), has been the focus of extensive research in the last decade, including classical candidate gene association studies , as well as novel genome-wide association studies (GWAS) [13,22-26] and exome-wide association studies (EWAS) (Fig. 1) .
Authors of the candidate gene association studies have identified several loci associated with the disease susceptibility and progression, including variation in transcription factors involved in the circadian rhythm (CLOCK transcription factor), the signal transducer and activator of transcription 3 STAT3, the multidrug-resistance-associated protein gene (ABCC2), and the nuclear pregnane X receptor (PXR) [28-31]. Authors of other association studies reported variants in genes involved in metabolic and inflammatory pathways , whereby the candidate genes for NAFLD were selected either on the basis of their known or presumed function, or due to their biological plausibility in the disease pathophysiology.
Modeling strategies that included system biology approaches, integration of gene-protein interactions and prediction of associated pathways showed that the majority of loci that were reported in association with NAFLD or NASH were not only related with the regulation of lipid homeostasis and the cellular lipid metabolic process, but also pathways involved in cardiovascular system regulation and nuclear receptors . We indeed predicted for the first time the role of NR1H4 or farnesoid X nuclear receptor, a ligandactivated transcription factor that functions as a receptor for bile acids and the RXRA or retinoid X receptor, a nuclear receptor that mediates the biological effects of retinoids by their involvement in retinoic acid-mediated gene activation . This information  has been the platform for attractive pharmacological targets that were further assayed and validated as treatment strategies in NASH [33,34].
The results yielded by the first GWAS of NAFLD as a part of which 9,229 nonsynonymous sequence variations were screened have significantly contributed to our knowledge of the genetic component of the disease . The nonsynonymous rs738409 C/G variant in PNPLA3 (patatin-like phospholipase domain containing 3, also known as adiponutrin or calcium-independent phospholipase A2-epsilon), which encodes the amino acid substitution I148M, is regarded as the major genetic component of NAFLD and NASH . The risk effect of rs738409 on developing fatty liver in the context of NAFLD is the strongest ever reported for a common variant modifying the genetic susceptibility of NAFLD (5.3% of the total variance) [35,36]. The rs738409 is not only significantly associated with the accumulation of fat in the liver (the lipid fat content in carriers of the GG homozygous genotype is 73% higher compared with that measured in the carriers of the CC genotype) but also with the histological disease severity and progression of NAFLD (odds ratio-OR 1.88 per G allele; 95% confidence interval-CI 1.03–3.43; GG vs. CC homozygous carriers OR 3.488, 95% CI 1.859–6.545) [35,36]. The rs738409 also explains a fraction of sexual dymorphism associated with NAFLD, as significantly higher effect was demonstrated in women compared with men . Conversely, association analysis of rs738409 and MetS-associated diseases did not reveal any link with obesity or type diabetes, as summarized in a meta-analysis .
Extensive knowledge on the role of PNPLA3 in the regulation of liver metabolic functioning was recently gained by employing different strategies, including functional in vitro studies and experimental animals. PNPLA3 is a multifunctional enzyme with both triacylglycerol (TAG) lipase and acylglycerol O-acyltransferase activity that participates in TAG hydrolysis and the acyl-CoA independent transacylation of acylglycerols, as reviewed recently . The promoter activity of PNPLA3 is upregulated by glucose concentrations in a dose dependent manner . The effect of rs738409 variant has also been the subject of extensive research in the last decade, which has led to the consensus that the G-NAFLD-risk allele is associated with a loss of function . Collectively, the available evidence indicates that the variant participates in hepatocyte triacylglycerol remodeling [39-41]. On the other hand, we recently uncovered a novel role of rs738409 in global liver metabolism by performing high-throughput metabolic profiling of PNPLA3 siRNA-silencing and overexpression of wild type and mutant Ile148Met variants (isoleucine/methionine substitution at codon 148) in Huh/7 cells . Of note, silencing of PNPLA3 was associated with a global perturbation of Huh-7 hepatoma cells that resembled a catabolic response associated with protein breakdown . Overexpression of the PNPLA3 Met148 variant was associated with a 1.75-fold increase in lactic acid in comparison with the empty vector, suggesting a shift to anaerobic metabolism and mitochondrial dysfunction. Together, these results might explain the implication of the variant in disease progression.
The GWAS strategy was also employed by other groups in the search for the genetic component of NAFLD, and their respective studies involved different populations, study designs, sample sizes, and approaches employed in the characterization of the liver phenotype. For example, Chalasani et al. focused on female adults with NAFLD diagnosed by liver biopsy , while Speliotes and colleagues conducted an exploration of the heritability of hepatic steatosis at the population level with abdominal-CT . On the other hand, Feitosa et al. utilized a combined approach of abdominal-CT and alanine-aminotransferase (ALT) levels as a surrogate of disease severity , while Kawaguchi and colleagues focused on establishing the genetic risk in Asian-descent patients [24,25] and DiStefano et al. measured liver fat content in morbidly obese individuals . The GWAS strategy was also used to explore the genetic locus that specifically influenced liver enzyme levels in the population, including alanine aminotransferase (ALT) [44,45].
Collectively, GWAS on NAFLD uncovered not only highly replicated variants such as rs738409, but also variants in loci whose function is diverse in the context of NAFLD. For instance, variants in: 1) PPP1R3B (protein phosphatase 1, regulatory Subunit 3B) that is implicated in the regulation of glycogen synthesis in liver or skeletal muscle; 2) FDFT1 (farnesyl-diphosphate farnesyltransferase) that is involved in cholesterol biosynthesis; 3) ERLIN1 (ER lipid raft associated 1) that mediates the endoplasmic reticulum-associated degradation; 4) LTBP3 (latent transforming growth factor beta) that plays a structural role in the extracellular matrix; 5) PARVB (parvin beta) that plays a role in cytoskeleton organization and cell adhesion and 6) variants in the NCAN/TM6SF2/CILP2/PBX4 multilocus, all of them were reported to be associated with NAFLD.
The results obtained in these first NAFLD-EWAS  prompted several replication studies in different populations around the world [46-51]. Their combined findings have definitively confirmed that the causal variant in the multi-gene locus named NCAN/TM6SF2/CILP2/PBX4 is the nonsynonymous rs58542926 variant located in the TM6SF2 (transmembrane 6 superfamily member 2) gene. In the initial study, Kozlitina et al. demonstrated that rs58542926 encoding an amino acid substitution p.Glu167Lys (E167K) was significantly associated with hepatic triglyceride content (HTGC), as measured by proton magnetic resonance spectroscopy (H-MRS) . The authors showed that the effect of rs58542926 on HTGC was independent of the effect mediated by the rs738409, obesity, and insulin resistance, as assessed by HOMA-IR, or alcohol intake . However, authors of the subsequent studies reported conflicting results regarding the association with fatty liver, histological steatosis, NASH or fibrosis . In fact, the association with liver fibrosis  remains to be confirmed, as most of the studies cited above showed that the association does not resist adjustment by NASH  or is not statistically significant [49,51].
Authors of functional studies on TM6SF2 gene demonstrated that this locus and the mentioned variant are relevant for the NAFLD disease biology. Specifically, Kozlitina et al. showed in vitro that murine hepatoma cells expressing the Lys167-TM6SF2 (E variant) protein have reduced expression levels compared with the wild-type , while Mahdessian demonstrated that TM6SF2 is localized in the endoplasmic reticulum and the ER-Golgi intermediate compartment of human liver cells . Available experimental evidence indicates that, while TM6SF2 protein is required to mobilize neutral lipids for VLDL assembly , it is not required for secretion of apoB-containing lipoproteins .
Our group explored the level of liver TM6SF2 expression in subjects with NAFLD at different stages of disease severity, observing that TM6SF2 protein expression was significantly reduced in the liver of patients with NAFLD . In addition, we found that liver TM6SF2 immunoreactivity was reduced in the NAFLD-risk T-allele (Lys167) carriers. Moreover, allelic-specific expression analysis of cDNA isolated from the liver tissue confirmed that expression levels of rs58542926-T are about 56% of that of the C allele . These findings suggest that the TM6SF2-NAFLD-risk T-allele is associated with decreased gene and protein expression in the liver of affected patients .
TM6SF2-rs58542926 presents an interesting clinical paradox, as while the C (Glu167) allele is consistently associated with increased cardiovascular risk by increasing circulating LDL-cholesterol , the T allele (Lys167) is associated with NAFLD and NASH [46-51]. In fact, the impact of the variant on the risk of CVD is indeed explained by the protection of the T allele against having elevated blood lipid levels . Finally, summarized evidence of the association of rs58542926 and the level of serum transaminases indicates that ALT (n=94,414) and AST (n=93,809) levels are significantly associated with the variant in NAFLD, but not in other chronic liver diseases, including chronic hepatitis C and B . However, this increase represents -2.5 (9.8%) and 1.2 (5%) IU/L of ALT and AST, respectively, which is relatively small compared with the large effect of the PNPLA3-rs738409 variant .
Finally, a variant in GCKR locus (glucokinase regulatory gene), also uncovered by NAFLD-GWAS, has recently gained attention of researchers due to its biological plausibility in the disease pathogenesis. Specifically, the missense variant rs780094 was associated with a modest risk of having a fatty liver , whereby summarized evidence demonstrates a ~ 1.2-fold higher risk of developing NAFLD . Interestingly, GCKR mutations have been involved in the maturity-onset diabetes in young individuals , given that diabetes/glucose intolerance/insulin resistance is a well-known risk factor for NAFLD.
A missense rs641738 C>T variant formally located in the transmembrane channel-like 4 gene (TMC4), but also at a few hundred bases of the 3’ untranslated region of the MBOAT7 (membrane bound O-acyltransferase domain-containing 7 gene) , was recently associated with the risk of NAFLD in European Caucasian population, but not in individuals of other ethnicities. Unfortunately, these findings were not further investigated in studies involving other populations of patients and controls of European descent . Thus, the association with the MBOAT7 should be further confirmed or refuted in other populations.
In summary, available and replicated evidence on the genetic risk of NAFLD suggests presence of at least three missense variants in three different but biologically plausible loci (PNPLA3, TM6SF2 and GCKR) associated with the disease severity and progression. Nevertheless, the NAFLD-associated variants have a quite diverse effect on the susceptibility of NAFLD—from intermediate (~ OR 3.4) to low (OR 1.2). This, in turn results in a rather diverse frequency of the risk allele (MAF) (from ~30% for rs738409 to 7% for rs58542927) (Fig. 3). Collectively, these results support the notion of common variants  in the pathogenesis of NAFLD, which are presently deemed the major contributors of the disease risk. This observation also highlights the paucity of information on the role of rare variants, as well as structural variation, gene-by-gene-interaction, and gene-by-environment interaction in the biology of the disease, all of which must be explored further as they likely contribute to the disease heritability (Fig. 3). Of note, genome wide exploration of mitochondrial DNA has revealed a substantial proportion of the missing heritability of NAFLD and the disease severity associated with genetic variation in genes of the oxidative-phosphorylation chain (OXPHOS) . In fact, we have shown that patients with different degrees of fibrosis have an overall enrichment of 1.4-fold mutation rate . It is noteworthy that epigenetic marks not only in nuclear but also mitochondrial DNA-encoded genes, which are by definition transmitted among generations and may explain part of the missing heritability, may be pivotal for the development and progression of NAFLD, as was reported by our group for the first time [7,9,10]. Figure 3 highlights the major milestones in the attainment of knowledge on NAFLD, from the first phenotypic description of the disease  to the first scoring system for the histological assessment of NASH by Brunt and coworkers  and major findings yielded by GWAS [13,26,27].
THE GENETIC SUSCEPTIBILITY IN LEAN NAFLD
Although the pathogenesis of NAFLD is not fully understood, a growing body of evidence supports the notion that the disease is strongly associated with MetS and its intermediate phenotypes, including obesity, type 2 diabetes, and cardiovascular disease [2,3,20,63].
Nevertheless, epidemiological studies, particularly those conducted in (but not restricted to) Asia [64-66], highlight the observation that NAFLD can be also seen in lean subjects. In fact, ~10−20% of all NAFLD cases in non-obese Americans and Asians are ascribed to lean NAFLD. As the results yielded by current studies are conflicting and inconclusive, the risk factors associated with lean NAFLD are still being revised. There are few published reports that include well-characterized NAFLD in non-obese patients. For instance, findings of a study involving a large number of lean patients from Asia with NAFLD proven by liver biopsy have been recently published, suggesting that while the risk factors in lean and obese NAFLD are common, lean NAFLD patients tend to have less severe disease form and may have a better prognosis than obese patients . Still, among lean NAFLD patients, hypertriglyceridemia and higher creatinine were significantly associated with advanced liver disease . The most remarkable finding of this study is the lack of difference in the magnitude of the association between the PNPLA3 rs738409 variant and the disease severity in lean NAFLD patients in comparison with non-lean NAFLD patients . However, in their US-based study, Feldman and colleagues have not confirmed this observation, possibly due to the small sample (including non-obese patients) studied . Finally, authors of a population-based study conducted in Hong Kong that included a large population of subjects reported that the rs738409 G-allele was more common in non-obese than in obese NAFLD patients .
In conclusion, while the role of genetic factors associated with the risk of NAFLD in lean patients is not fully understood, it is plausible to speculate that variants associated with the risk of type 2 diabetes or insulin resistance [21,60], as well as variants in the mitochondrial DNA, are involved . At any rate, it is reasonable to assume that rs738409, which is not associated with either obesity or type 2 diabetes , has no differential influence in the genetic risk of NAFLD in lean individuals. In fact, meta-regression analysis (random effects model, within-study variance estimated with the unrestricted maximum-likelihood method) of BMI, fasting glucose and insulin levels, and HOMA-IR index according to the rs738409 variant (homozygous GG and CC) in pooled estimates including 1404 subjects from seven studies, showed that the variant does not influence these traits .
THE ROLE OF GENETICS IN THE STRATIFICATION OF INDIVIDUALS AT RISK AND PREDICTION OF THERAPEUTIC INTERVENTION
The remarkable progress in the understanding of the genetic risk of NAFLD and NASH offers the unique opportunity to translate this information into the clinical practice. Specifically, the design and identification of novel diagnostic tools, as well as promising therapeutic targets, would benefit from thorough knowledge of the individual genetic makeup of affected patients .
Several strategies were proposed for the application of genetic variants in the risk prediction, particularly the rs738409 in PNPLA3, for improving the diagnostic accuracy of NASH or predicting non-invasively the response to lifestyle or surgery intervention (a detailed explanation is shown in Table 1).
Although a priori use of genetic testing appears encouraging, the overall experience suggests that the value of utilizing SNPs for improving the efficacy in the diagnosis of NASH is still inconclusive. Some Authors of some studies support the introduction of rs738409 into diagnostic tests, while others do not advocate its use (Table 1). For example, authors of the NASH ClinLipMet Score suggested that the use of plasma metabolites—including glutamate, isoleucine, glycine, lysophosphatidylcholine 16:0, and phosphoethanolamine 40:6—along with routine biochemical test (AST, and fasting insulin), together with rs738409 genotypes , predicts NASH with an area under the receiver operating characteristic of 0.866 (95% confidence interval, 0.820-0.913) . Conversely, Kotronen and coworkers evaluated the performance of predicting NAFLD by combining routine clinical and laboratory data and the rs738409 genotypes, and observed a sensitivity of 86% and specificity of 71% in the estimation of increased liver fat content. However, the incorporation of the genetic information into the test score improved the accuracy of the prediction by less than 1% .
A more complex, yet poorly explored, scenario is the stratification of patients’ response to medications or lifestyle intervention according to their genetic background. Authors of several extant studies have explored this issue, but further research is urgently needed. In particular, future prospective studies should involve large sample size, a proper design and long-term follow-up of the patients. Future research programs that integrate health records—ideally electronic ones—with patients’ genomic, transcriptomic, proteomic, and metabolomic information will change our ability to control or cure the disease. For example, we recently showed that NASH is associated with a state of betaine insufficiency and a missense variant (rs1805074- p.Ser646Pro) in DMGDH (dimethylglycine dehydrogenase mitochondrial) that modulates the levels of betaine and related metabolites is associated with the disease severity . We hope that the exact and precise genetic knowledge gained with the tremendous advance in omics techniques would allow the medical community to fully benefit from the application of Personalized or Precision Medicine in the context of this relatively new “pandemic” disease.
This study was supported partially by grants PICT 2014-543, PICT 2014-1816 and PICT 2015-0551 (Agencia Nacional de Promoción Científica y Tecnológica
Conflicts of Interest: The authors have no conflicts to disclose.
glucokinase gene regulator
exome wide association studies
genome wide association studies
Membrane Bound O-Acyltransferase Domain Containing 7
nonalcoholic fatty liver disease
patatin-like phospholipase domain containing 3
single nucleotide polymorphism
transmembrane 6 superfamily member 2