Gender classification in classical fiction: A computational analysis of 1113 fictions.
| Author | |
|---|---|
| Abstract | 
   :  
              Recent decades have witnessed the rapid development of literary studies on gender and writing style. One of the common limitations of previous studies is that they analyze only a few texts, which some researchers have already pointed out. In this study, we attempt to find the features that best facilitate the classification of texts by authorial gender. Based on a corpus of 1113 classical fictions from the early 19 century to the early 20 century. Eight algorithms, including SVM, random forest, decision tree, AdaBoost, logistic regression, K-nearest neighbors, gradient boosting and XGBoost, are used to automatically select the features that are most useful for properly categorizing a text. We find that word frequency is the most important predictor for identifying authorial gender in classical fictions, achieving an accuracy rate of 92%. We also find that nationhood is not particularly impactful when dealing with authorial gender differences in classical fictions, as genderlectal variation is 'universal' in the English-speaking world.  | 
        
| Year of Publication | 
   :  
              2022 
           | 
        
| Journal | 
   :  
              Mathematical biosciences and engineering : MBE 
           | 
        
| Volume | 
   :  
              19 
           | 
        
| Issue | 
   :  
              9 
           | 
        
| Number of Pages | 
   :  
              8892-8907 
           | 
        
| Date Published | 
   :  
              2022 
           | 
        
| ISSN Number | 
   :  
              1547-1063 
           | 
        
| URL | 
   :  
              https://www.aimspress.com/article/10.3934/mbe.2022412 
           | 
        
| DOI | 
   :  
              10.3934/mbe.2022412 
           | 
        
| Short Title | 
   :  
              Math Biosci Eng 
           | 
        
| Download citation |