Imbalanced dataset binary classificationAre unbalanced datasets problematic, and (how) does oversampling (purport to) help?Imbalanced data classification using boosting algorithmsBinary classification in imbalanced dataClassification algorithms for handling Imbalanced data setsWhat is the effect of training a model on an imbalanced dataset & using it on a balanced dataset?imbalanced binary classification with skewed featuresCross validation and imbalanced learningimbalanced datasetcross validation gives wrong resultsData augmentation or weighted loss function for imbalanced classes?Handling imbalanced data for classification

What's the difference between repeating elections every few years and repeating a referendum after a few years?

Manga about a female worker who got dragged into another world together with this high school girl and she was just told she's not needed anymore

"My colleague's body is amazing"

Hosting Wordpress in a EC2 Load Balanced Instance

Is domain driven design an anti-SQL pattern?

Where else does the Shulchan Aruch quote an authority by name?

Why is my log file so massive? 22gb. I am running log backups

Are objects structures and/or vice versa?

Denied boarding due to overcrowding, Sparpreis ticket. What are my rights?

Unbreakable Formation vs. Cry of the Carnarium

Prime joint compound before latex paint?

How can I add custom success page

Are white and non-white police officers equally likely to kill black suspects?

Lied on resume at previous job

Finding files for which a command fails

What are the advantages and disadvantages of running one shots compared to campaigns?

Landlord wants to switch my lease to a "Land contract" to "get back at the city"

How to answer pointed "are you quitting" questioning when I don't want them to suspect

Is ipsum/ipsa/ipse a third person pronoun, or can it serve other functions?

Why do UK politicians seemingly ignore opinion polls on Brexit?

What is the offset in a seaplane's hull?

Can the Produce Flame cantrip be used to grapple, or as an unarmed strike, in the right circumstances?

Is Social Media Science Fiction?

extract characters between two commas?



Imbalanced dataset binary classification


Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?Imbalanced data classification using boosting algorithmsBinary classification in imbalanced dataClassification algorithms for handling Imbalanced data setsWhat is the effect of training a model on an imbalanced dataset & using it on a balanced dataset?imbalanced binary classification with skewed featuresCross validation and imbalanced learningimbalanced datasetcross validation gives wrong resultsData augmentation or weighted loss function for imbalanced classes?Handling imbalanced data for classification






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








2












$begingroup$


I am new in ML & DS and i have a dataset with an imbalance of 9:1 for Binary Classification,as an assignment. Could you please guide me in this regard? Also Which classifier is best for Imbalanced Binary Classification?



Regrds.










share|cite|improve this question







New contributor




Sid_Mirza is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$











  • $begingroup$
    Related: Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?
    $endgroup$
    – Stephan Kolassa
    6 hours ago

















2












$begingroup$


I am new in ML & DS and i have a dataset with an imbalance of 9:1 for Binary Classification,as an assignment. Could you please guide me in this regard? Also Which classifier is best for Imbalanced Binary Classification?



Regrds.










share|cite|improve this question







New contributor




Sid_Mirza is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$











  • $begingroup$
    Related: Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?
    $endgroup$
    – Stephan Kolassa
    6 hours ago













2












2








2





$begingroup$


I am new in ML & DS and i have a dataset with an imbalance of 9:1 for Binary Classification,as an assignment. Could you please guide me in this regard? Also Which classifier is best for Imbalanced Binary Classification?



Regrds.










share|cite|improve this question







New contributor




Sid_Mirza is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I am new in ML & DS and i have a dataset with an imbalance of 9:1 for Binary Classification,as an assignment. Could you please guide me in this regard? Also Which classifier is best for Imbalanced Binary Classification?



Regrds.







machine-learning classification binary-data unbalanced-classes






share|cite|improve this question







New contributor




Sid_Mirza is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|cite|improve this question







New contributor




Sid_Mirza is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|cite|improve this question




share|cite|improve this question






New contributor




Sid_Mirza is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 14 hours ago









Sid_MirzaSid_Mirza

112




112




New contributor




Sid_Mirza is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Sid_Mirza is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Sid_Mirza is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • $begingroup$
    Related: Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?
    $endgroup$
    – Stephan Kolassa
    6 hours ago
















  • $begingroup$
    Related: Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?
    $endgroup$
    – Stephan Kolassa
    6 hours ago















$begingroup$
Related: Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?
$endgroup$
– Stephan Kolassa
6 hours ago




$begingroup$
Related: Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?
$endgroup$
– Stephan Kolassa
6 hours ago










1 Answer
1






active

oldest

votes


















6












$begingroup$

You got off on the wrong foot by conceptualizing this as a classification problem. The fact that $Y$ is binary has nothing to do with trying to make classifications. And when the balance of $Y$ is far from 1:1 you need to think about modeling tendencies for $Y$, not modeling $Y$. In other words, the appropriate task is to estimate $P(Y=1 | X)$ using a model such as the binary logistic regression model. The logistic model is a direct probability estimator. Details may be found here and here.



Once you have a validated probability model and a utility/cost/loss function you can generate optimum decisions. The probabilities help to trade off the consequences of wrong decisions.






share|cite|improve this answer









$endgroup$












  • $begingroup$
    Thanks Sir Frank Harrell, The dataset is in floating point values but the target is in binary form as you said 'Y'. i applied Linear Regression, Random Forests,Decision Tree and some ensemble methods but the Linear regression gave an AUC score of 78.2% whereas random forests and LightGBM performed better. Now i want to increase the AUC score. Here is the list of parameters i used for lgb:
    $endgroup$
    – Sid_Mirza
    8 hours ago











  • $begingroup$
    params = "objective" : "binary", "metric" : "auc", "boosting": 'gbdt', "max_depth" : -1, "num_leaves" : 13, "learning_rate" : 0.01, "bagging_freq": 5, "bagging_fraction" : 0.4, "feature_fraction" : 0.05, "min_data_in_leaf": 80, "min_sum_heassian_in_leaf": 10, "tree_learner": "serial", "boost_from_average": "false", "bagging_seed" : random_state, "verbosity" : 1, "seed": random_state
    $endgroup$
    – Sid_Mirza
    8 hours ago











Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);






Sid_Mirza is a new contributor. Be nice, and check out our Code of Conduct.









draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401800%2fimbalanced-dataset-binary-classification%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









6












$begingroup$

You got off on the wrong foot by conceptualizing this as a classification problem. The fact that $Y$ is binary has nothing to do with trying to make classifications. And when the balance of $Y$ is far from 1:1 you need to think about modeling tendencies for $Y$, not modeling $Y$. In other words, the appropriate task is to estimate $P(Y=1 | X)$ using a model such as the binary logistic regression model. The logistic model is a direct probability estimator. Details may be found here and here.



Once you have a validated probability model and a utility/cost/loss function you can generate optimum decisions. The probabilities help to trade off the consequences of wrong decisions.






share|cite|improve this answer









$endgroup$












  • $begingroup$
    Thanks Sir Frank Harrell, The dataset is in floating point values but the target is in binary form as you said 'Y'. i applied Linear Regression, Random Forests,Decision Tree and some ensemble methods but the Linear regression gave an AUC score of 78.2% whereas random forests and LightGBM performed better. Now i want to increase the AUC score. Here is the list of parameters i used for lgb:
    $endgroup$
    – Sid_Mirza
    8 hours ago











  • $begingroup$
    params = "objective" : "binary", "metric" : "auc", "boosting": 'gbdt', "max_depth" : -1, "num_leaves" : 13, "learning_rate" : 0.01, "bagging_freq": 5, "bagging_fraction" : 0.4, "feature_fraction" : 0.05, "min_data_in_leaf": 80, "min_sum_heassian_in_leaf": 10, "tree_learner": "serial", "boost_from_average": "false", "bagging_seed" : random_state, "verbosity" : 1, "seed": random_state
    $endgroup$
    – Sid_Mirza
    8 hours ago















6












$begingroup$

You got off on the wrong foot by conceptualizing this as a classification problem. The fact that $Y$ is binary has nothing to do with trying to make classifications. And when the balance of $Y$ is far from 1:1 you need to think about modeling tendencies for $Y$, not modeling $Y$. In other words, the appropriate task is to estimate $P(Y=1 | X)$ using a model such as the binary logistic regression model. The logistic model is a direct probability estimator. Details may be found here and here.



Once you have a validated probability model and a utility/cost/loss function you can generate optimum decisions. The probabilities help to trade off the consequences of wrong decisions.






share|cite|improve this answer









$endgroup$












  • $begingroup$
    Thanks Sir Frank Harrell, The dataset is in floating point values but the target is in binary form as you said 'Y'. i applied Linear Regression, Random Forests,Decision Tree and some ensemble methods but the Linear regression gave an AUC score of 78.2% whereas random forests and LightGBM performed better. Now i want to increase the AUC score. Here is the list of parameters i used for lgb:
    $endgroup$
    – Sid_Mirza
    8 hours ago











  • $begingroup$
    params = "objective" : "binary", "metric" : "auc", "boosting": 'gbdt', "max_depth" : -1, "num_leaves" : 13, "learning_rate" : 0.01, "bagging_freq": 5, "bagging_fraction" : 0.4, "feature_fraction" : 0.05, "min_data_in_leaf": 80, "min_sum_heassian_in_leaf": 10, "tree_learner": "serial", "boost_from_average": "false", "bagging_seed" : random_state, "verbosity" : 1, "seed": random_state
    $endgroup$
    – Sid_Mirza
    8 hours ago













6












6








6





$begingroup$

You got off on the wrong foot by conceptualizing this as a classification problem. The fact that $Y$ is binary has nothing to do with trying to make classifications. And when the balance of $Y$ is far from 1:1 you need to think about modeling tendencies for $Y$, not modeling $Y$. In other words, the appropriate task is to estimate $P(Y=1 | X)$ using a model such as the binary logistic regression model. The logistic model is a direct probability estimator. Details may be found here and here.



Once you have a validated probability model and a utility/cost/loss function you can generate optimum decisions. The probabilities help to trade off the consequences of wrong decisions.






share|cite|improve this answer









$endgroup$



You got off on the wrong foot by conceptualizing this as a classification problem. The fact that $Y$ is binary has nothing to do with trying to make classifications. And when the balance of $Y$ is far from 1:1 you need to think about modeling tendencies for $Y$, not modeling $Y$. In other words, the appropriate task is to estimate $P(Y=1 | X)$ using a model such as the binary logistic regression model. The logistic model is a direct probability estimator. Details may be found here and here.



Once you have a validated probability model and a utility/cost/loss function you can generate optimum decisions. The probabilities help to trade off the consequences of wrong decisions.







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered 13 hours ago









Frank HarrellFrank Harrell

55.9k3110245




55.9k3110245











  • $begingroup$
    Thanks Sir Frank Harrell, The dataset is in floating point values but the target is in binary form as you said 'Y'. i applied Linear Regression, Random Forests,Decision Tree and some ensemble methods but the Linear regression gave an AUC score of 78.2% whereas random forests and LightGBM performed better. Now i want to increase the AUC score. Here is the list of parameters i used for lgb:
    $endgroup$
    – Sid_Mirza
    8 hours ago











  • $begingroup$
    params = "objective" : "binary", "metric" : "auc", "boosting": 'gbdt', "max_depth" : -1, "num_leaves" : 13, "learning_rate" : 0.01, "bagging_freq": 5, "bagging_fraction" : 0.4, "feature_fraction" : 0.05, "min_data_in_leaf": 80, "min_sum_heassian_in_leaf": 10, "tree_learner": "serial", "boost_from_average": "false", "bagging_seed" : random_state, "verbosity" : 1, "seed": random_state
    $endgroup$
    – Sid_Mirza
    8 hours ago
















  • $begingroup$
    Thanks Sir Frank Harrell, The dataset is in floating point values but the target is in binary form as you said 'Y'. i applied Linear Regression, Random Forests,Decision Tree and some ensemble methods but the Linear regression gave an AUC score of 78.2% whereas random forests and LightGBM performed better. Now i want to increase the AUC score. Here is the list of parameters i used for lgb:
    $endgroup$
    – Sid_Mirza
    8 hours ago











  • $begingroup$
    params = "objective" : "binary", "metric" : "auc", "boosting": 'gbdt', "max_depth" : -1, "num_leaves" : 13, "learning_rate" : 0.01, "bagging_freq": 5, "bagging_fraction" : 0.4, "feature_fraction" : 0.05, "min_data_in_leaf": 80, "min_sum_heassian_in_leaf": 10, "tree_learner": "serial", "boost_from_average": "false", "bagging_seed" : random_state, "verbosity" : 1, "seed": random_state
    $endgroup$
    – Sid_Mirza
    8 hours ago















$begingroup$
Thanks Sir Frank Harrell, The dataset is in floating point values but the target is in binary form as you said 'Y'. i applied Linear Regression, Random Forests,Decision Tree and some ensemble methods but the Linear regression gave an AUC score of 78.2% whereas random forests and LightGBM performed better. Now i want to increase the AUC score. Here is the list of parameters i used for lgb:
$endgroup$
– Sid_Mirza
8 hours ago





$begingroup$
Thanks Sir Frank Harrell, The dataset is in floating point values but the target is in binary form as you said 'Y'. i applied Linear Regression, Random Forests,Decision Tree and some ensemble methods but the Linear regression gave an AUC score of 78.2% whereas random forests and LightGBM performed better. Now i want to increase the AUC score. Here is the list of parameters i used for lgb:
$endgroup$
– Sid_Mirza
8 hours ago













$begingroup$
params = "objective" : "binary", "metric" : "auc", "boosting": 'gbdt', "max_depth" : -1, "num_leaves" : 13, "learning_rate" : 0.01, "bagging_freq": 5, "bagging_fraction" : 0.4, "feature_fraction" : 0.05, "min_data_in_leaf": 80, "min_sum_heassian_in_leaf": 10, "tree_learner": "serial", "boost_from_average": "false", "bagging_seed" : random_state, "verbosity" : 1, "seed": random_state
$endgroup$
– Sid_Mirza
8 hours ago




$begingroup$
params = "objective" : "binary", "metric" : "auc", "boosting": 'gbdt', "max_depth" : -1, "num_leaves" : 13, "learning_rate" : 0.01, "bagging_freq": 5, "bagging_fraction" : 0.4, "feature_fraction" : 0.05, "min_data_in_leaf": 80, "min_sum_heassian_in_leaf": 10, "tree_learner": "serial", "boost_from_average": "false", "bagging_seed" : random_state, "verbosity" : 1, "seed": random_state
$endgroup$
– Sid_Mirza
8 hours ago










Sid_Mirza is a new contributor. Be nice, and check out our Code of Conduct.









draft saved

draft discarded


















Sid_Mirza is a new contributor. Be nice, and check out our Code of Conduct.












Sid_Mirza is a new contributor. Be nice, and check out our Code of Conduct.











Sid_Mirza is a new contributor. Be nice, and check out our Code of Conduct.














Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401800%2fimbalanced-dataset-binary-classification%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Best approach to update all entries in a list that is paginated?Best way to add items to a paginated listChoose Your Country: Best Usability approachUpdate list when a user is viewing the list without annoying themWhen would the best day to update your webpage be?What should happen when I add a Row to a paginated, sorted listShould I adopt infinite scrolling or classical pagination?How to show user that page objects automatically updateWhat is the best location to locate the comments section in a list pageBest way to combine filtering and selecting items in a listWhen one of two inputs must be updated to satisfy a consistency criteria, which should you update (if at all)?

Вунгтау (аеропорт) Загальні відомості | Див. також | Посилання | Навігаційне меню10°22′00″ пн. ш. 107°05′00″ сх. д. / 10.36667° пн. ш. 107.08333° сх. д. / 10.36667; 107.0833310°22′00″ пн. ш. 107°05′00″ сх. д. / 10.36667° пн. ш. 107.08333° сх. д. / 10.36667; 107.083337731608Vinh AirportVinh airport facelift improves serviceвиправивши або дописавши їївиправивши або дописавши їїр

Тонконіг бульбистий Зміст Опис | Поширення | Екологія | Господарське значення | Примітки | Див. також | Література | Джерела | Посилання | Навігаційне меню1114601320038-241116202404kew-435458Poa bulbosaЭлектронный каталог сосудистых растений Азиатской России [Електронний каталог судинних рослин Азіатської Росії]Малышев Л. Л. Дикие родичи культурных растений. Poa bulbosa L. - Мятлик луковичный. [Малишев Л. Л. Дикі родичи культурних рослин. Poa bulbosa L. - Тонконіг бульбистий.]Мятлик (POA) Сем. Злаки (Мятликовые) [Тонконіг (POA) Род. Злаки (Тонконогові)]Poa bulbosa Linnaeus, Sp. Pl. 1: 70. 1753. 鳞茎早熟禾 lin jing zao shu he (Description from Flora of China) [Poa bulbosa Linnaeus, Sp. Pl. 1: 70. 1753. 鳞茎早熟禾 lin jing zao shu he (Опис від Флора Китаю)]Poa bulbosa L. – lipnice cibulkatá / lipnica cibulkatáPoa bulbosa в базі даних Poa bulbosa на сайті Poa bulbosa в базі даних «Global Biodiversity Information Facility» (GBIF)Poa bulbosa в базі даних «Euro + Med PlantBase» — інформаційному ресурсі для Євро-середземноморського розмаїття рослинPoa bulbosa L. на сайті «Плантариум»