Identify and count spells (Distinctive events within each group) The Next CEO of Stack OverflowR - list to data frameCount number of rows within each groupCounting unique / distinct values by group in a data frameR: find relative weight within each group and within the entire dataframeR: how to calculate summary for each group and all the data?count the number of distinct variables in a groupusing tidyverse; counting after and before change in value, within groups, generating new variables for each unique shiftDistinct in r within groups of datahow to get count and distinct count with group by in dataframe RNest a dataframe by group, but include extra rows within each groupChange value by group based in reference within group

Is this a new Fibonacci Identity?

Free fall ellipse or parabola?

How does a dynamic QR code work?

Physiological effects of huge anime eyes

Mathematica command that allows it to read my intentions

What happens if you break a law in another country outside of that country?

Creating a script with console commands

Why did the Drakh emissary look so blurred in S04:E11 "Lines of Communication"?

How dangerous is XSS

Avoiding the "not like other girls" trope?

Is it correct to say moon starry nights?

Is it reasonable to ask other researchers to send me their previous grant applications?

Find the majority element, which appears more than half the time

Compensation for working overtime on Saturdays

Masking layers by a vector polygon layer in QGIS

MT "will strike" & LXX "will watch carefully" (Gen 3:15)?

Can you teleport closer to a creature you are Frightened of?

Can Sri Krishna be called 'a person'?

How can I prove that a state of equilibrium is unstable?

Does the Idaho Potato Commission associate potato skins with healthy eating?

Is a distribution that is normal, but highly skewed, considered Gaussian?

Does int main() need a declaration on C++?

How seriously should I take size and weight limits of hand luggage?

Could a dragon use its wings to swim?



Identify and count spells (Distinctive events within each group)



The Next CEO of Stack OverflowR - list to data frameCount number of rows within each groupCounting unique / distinct values by group in a data frameR: find relative weight within each group and within the entire dataframeR: how to calculate summary for each group and all the data?count the number of distinct variables in a groupusing tidyverse; counting after and before change in value, within groups, generating new variables for each unique shiftDistinct in r within groups of datahow to get count and distinct count with group by in dataframe RNest a dataframe by group, but include extra rows within each groupChange value by group based in reference within group










6















I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell is what I'm trying to compute. I've tried using dplyr's lead and lag, but that gets too complicated. I've tried rle but got nowhere.



enter image description here



ReprEx



df <- structure(list(time = structure(c(1538876340, 1538876400, 
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))


I prefer a tidyverse solution.



Assumptions



  1. Data is sorted by group and then by time


  2. There are no gaps in time within each group





Update



Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)



  1. the rle approach by @markus took 0.53 seconds

  2. the cumsum approach by @M-M took 2.85 seconds

  3. the function approach by @MrFlick took 0.66 seconds

  4. the rle and dense_rank by @tmfmnk took 0.89









share|improve this question



















  • 2





    For someone who is not familiar with how the spell is computed, can you share a formula or description?

    – nsinghs
    7 hours ago











  • @nsinghs I think they mean "hospital spell"

    – zx8754
    7 hours ago















6















I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell is what I'm trying to compute. I've tried using dplyr's lead and lag, but that gets too complicated. I've tried rle but got nowhere.



enter image description here



ReprEx



df <- structure(list(time = structure(c(1538876340, 1538876400, 
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))


I prefer a tidyverse solution.



Assumptions



  1. Data is sorted by group and then by time


  2. There are no gaps in time within each group





Update



Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)



  1. the rle approach by @markus took 0.53 seconds

  2. the cumsum approach by @M-M took 2.85 seconds

  3. the function approach by @MrFlick took 0.66 seconds

  4. the rle and dense_rank by @tmfmnk took 0.89









share|improve this question



















  • 2





    For someone who is not familiar with how the spell is computed, can you share a formula or description?

    – nsinghs
    7 hours ago











  • @nsinghs I think they mean "hospital spell"

    – zx8754
    7 hours ago













6












6








6


1






I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell is what I'm trying to compute. I've tried using dplyr's lead and lag, but that gets too complicated. I've tried rle but got nowhere.



enter image description here



ReprEx



df <- structure(list(time = structure(c(1538876340, 1538876400, 
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))


I prefer a tidyverse solution.



Assumptions



  1. Data is sorted by group and then by time


  2. There are no gaps in time within each group





Update



Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)



  1. the rle approach by @markus took 0.53 seconds

  2. the cumsum approach by @M-M took 2.85 seconds

  3. the function approach by @MrFlick took 0.66 seconds

  4. the rle and dense_rank by @tmfmnk took 0.89









share|improve this question
















I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell is what I'm trying to compute. I've tried using dplyr's lead and lag, but that gets too complicated. I've tried rle but got nowhere.



enter image description here



ReprEx



df <- structure(list(time = structure(c(1538876340, 1538876400, 
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))


I prefer a tidyverse solution.



Assumptions



  1. Data is sorted by group and then by time


  2. There are no gaps in time within each group





Update



Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)



  1. the rle approach by @markus took 0.53 seconds

  2. the cumsum approach by @M-M took 2.85 seconds

  3. the function approach by @MrFlick took 0.66 seconds

  4. the rle and dense_rank by @tmfmnk took 0.89






r dataframe dplyr time-series tidyverse






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 hours ago







Thomas Speidel

















asked 7 hours ago









Thomas SpeidelThomas Speidel

359216




359216







  • 2





    For someone who is not familiar with how the spell is computed, can you share a formula or description?

    – nsinghs
    7 hours ago











  • @nsinghs I think they mean "hospital spell"

    – zx8754
    7 hours ago












  • 2





    For someone who is not familiar with how the spell is computed, can you share a formula or description?

    – nsinghs
    7 hours ago











  • @nsinghs I think they mean "hospital spell"

    – zx8754
    7 hours ago







2




2





For someone who is not familiar with how the spell is computed, can you share a formula or description?

– nsinghs
7 hours ago





For someone who is not familiar with how the spell is computed, can you share a formula or description?

– nsinghs
7 hours ago













@nsinghs I think they mean "hospital spell"

– zx8754
7 hours ago





@nsinghs I think they mean "hospital spell"

– zx8754
7 hours ago












6 Answers
6






active

oldest

votes


















5














One option using rle



library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell =
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)

)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2


explanation



When we call



r <- rle(df$is.5)


the result we get is



r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1


We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.



We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.



r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5


Finally we call inverse.rle to get back a vector of the same length as is.5.



inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5


We do this for every group.






share|improve this answer




















  • 1





    I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.

    – M-M
    5 hours ago






  • 1





    @M-M Added some explanation. Thanks for the comment.

    – markus
    5 hours ago


















4














Here's a helper function that can return what you are after



spell_index <- function(time, flag) 
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0



And you can use it with your data like



library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)


Basically the helper functions uses lag() to look for changes. We use cumsum() to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.






share|improve this answer






























    1














    This works,



    The data,



    df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))


    We split our data by group,



    df2 <- split(df, df$group)


    Build a function we can apply to the list,



    my_func <- function(dat) change == -1, 0, cumsum(flag))) %>% 
    dplyr::select(time, group, is.5, spell)
    return(rst)



    Then apply it,



    l <- lapply(df2, my_func)


    We can now turn this list back into a data frame:



    do.call(rbind.data.frame, l)





    share|improve this answer
































      1














      A somehow different possibility could be:



      df %>%
      group_by(group) %>%
      mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
      group_by(group, is.5) %>%
      mutate(spell = dense_rank(spell)) %>%
      ungroup() %>%
      mutate(spell = ifelse(is.5 == 0, 0, spell))

      time group is.5 spell
      <dttm> <chr> <dbl> <dbl>
      1 2018-10-07 01:39:00 A 0 0
      2 2018-10-07 01:40:00 A 1 1
      3 2018-10-07 01:41:00 A 1 1
      4 2018-10-07 01:42:00 A 0 0
      5 2018-10-07 01:43:00 A 1 2
      6 2018-10-07 01:44:00 A 0 0
      7 2018-10-07 01:45:00 A 0 0
      8 2018-10-07 01:46:00 A 1 3
      9 2018-05-20 14:00:00 B 0 0
      10 2018-05-20 14:01:00 B 0 0
      11 2018-05-20 14:02:00 B 1 1
      12 2018-05-20 14:03:00 B 1 1
      13 2018-05-20 14:04:00 B 0 0
      14 2018-05-20 14:05:00 B 1 2





      share|improve this answer






























        1














        One options is using cumsum:



        library(dplyr)
        df %>% group_by(group) %>% arrange(group, time) %>%
        mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )


        # # A tibble: 14 x 4
        # # Groups: group [2]
        # time group is.5 spell
        # <dttm> <chr> <dbl> <dbl>
        # 1 2018-10-07 01:39:00 A 0 0
        # 2 2018-10-07 01:40:00 A 1 1
        # 3 2018-10-07 01:41:00 A 1 1
        # 4 2018-10-07 01:42:00 A 0 0
        # 5 2018-10-07 01:43:00 A 1 2
        # 6 2018-10-07 01:44:00 A 0 0
        # 7 2018-10-07 01:45:00 A 0 0
        # 8 2018-10-07 01:46:00 A 1 3
        # 9 2018-05-20 14:00:00 B 0 0
        # 10 2018-05-20 14:01:00 B 0 0
        # 11 2018-05-20 14:02:00 B 1 1
        # 12 2018-05-20 14:03:00 B 1 1
        # 13 2018-05-20 14:04:00 B 0 0
        # 14 2018-05-20 14:05:00 B 1 2


        c(0,lag(is.5)[-1]) != is.5 this takes care of assigning a new id (i.e. spell) whenever is.5 changes; but we want to avoid assigning new ones to those rows is.5 equal to 0 and that's why I have the second rule in cumsum function (i.e. (is.5!=0)).



        However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0. That's why I have multiplied the answer by is.5.






        share|improve this answer






























          1














          Here is one option with rleid from data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'group', get the run-length-id (rleid) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i with a logical vector to select rows that have 'spell' values not zero, match those values of 'spell' with unique 'spell' and assign it to 'spell'



          library(data.table)
          setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
          ][!!spell, spell := match(spell, unique(spell))][]
          # time group is.5 spell
          # 1: 2018-10-07 01:39:00 A 0 0
          # 2: 2018-10-07 01:40:00 A 1 1
          # 3: 2018-10-07 01:41:00 A 1 1
          # 4: 2018-10-07 01:42:00 A 0 0
          # 5: 2018-10-07 01:43:00 A 1 2
          # 6: 2018-10-07 01:44:00 A 0 0
          # 7: 2018-10-07 01:45:00 A 0 0
          # 8: 2018-10-07 01:46:00 A 1 3
          # 9: 2018-05-20 14:00:00 B 0 0
          #10: 2018-05-20 14:01:00 B 0 0
          #11: 2018-05-20 14:02:00 B 1 1
          #12: 2018-05-20 14:03:00 B 1 1
          #13: 2018-05-20 14:04:00 B 0 0
          #14: 2018-05-20 14:05:00 B 1 2



          Or after the first step, use .GRP



          df[!!spell, spell := .GRP, spell]





          share|improve this answer

























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55463310%2fidentify-and-count-spells-distinctive-events-within-each-group%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            6 Answers
            6






            active

            oldest

            votes








            6 Answers
            6






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            5














            One option using rle



            library(dplyr)
            df %>%
            group_by(group) %>%
            mutate(
            spell =
            r <- rle(is.5)
            r$values <- cumsum(r$values) * r$values
            inverse.rle(r)

            )
            # A tibble: 14 x 4
            # Groups: group [2]
            # time group is.5 spell
            # <dttm> <chr> <dbl> <dbl>
            # 1 2018-10-07 01:39:00 A 0 0
            # 2 2018-10-07 01:40:00 A 1 1
            # 3 2018-10-07 01:41:00 A 1 1
            # 4 2018-10-07 01:42:00 A 0 0
            # 5 2018-10-07 01:43:00 A 1 2
            # 6 2018-10-07 01:44:00 A 0 0
            # 7 2018-10-07 01:45:00 A 0 0
            # 8 2018-10-07 01:46:00 A 1 3
            # 9 2018-05-20 14:00:00 B 0 0
            #10 2018-05-20 14:01:00 B 0 0
            #11 2018-05-20 14:02:00 B 1 1
            #12 2018-05-20 14:03:00 B 1 1
            #13 2018-05-20 14:04:00 B 0 0
            #14 2018-05-20 14:05:00 B 1 2


            explanation



            When we call



            r <- rle(df$is.5)


            the result we get is



            r
            #Run Length Encoding
            # lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
            # values : num [1:10] 0 1 0 1 0 1 0 1 0 1


            We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.



            We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.



            r$values <- cumsum(r$values) * r$values
            r$values
            # [1] 0 1 0 2 0 3 0 4 0 5


            Finally we call inverse.rle to get back a vector of the same length as is.5.



            inverse.rle(r)
            # [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5


            We do this for every group.






            share|improve this answer




















            • 1





              I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.

              – M-M
              5 hours ago






            • 1





              @M-M Added some explanation. Thanks for the comment.

              – markus
              5 hours ago















            5














            One option using rle



            library(dplyr)
            df %>%
            group_by(group) %>%
            mutate(
            spell =
            r <- rle(is.5)
            r$values <- cumsum(r$values) * r$values
            inverse.rle(r)

            )
            # A tibble: 14 x 4
            # Groups: group [2]
            # time group is.5 spell
            # <dttm> <chr> <dbl> <dbl>
            # 1 2018-10-07 01:39:00 A 0 0
            # 2 2018-10-07 01:40:00 A 1 1
            # 3 2018-10-07 01:41:00 A 1 1
            # 4 2018-10-07 01:42:00 A 0 0
            # 5 2018-10-07 01:43:00 A 1 2
            # 6 2018-10-07 01:44:00 A 0 0
            # 7 2018-10-07 01:45:00 A 0 0
            # 8 2018-10-07 01:46:00 A 1 3
            # 9 2018-05-20 14:00:00 B 0 0
            #10 2018-05-20 14:01:00 B 0 0
            #11 2018-05-20 14:02:00 B 1 1
            #12 2018-05-20 14:03:00 B 1 1
            #13 2018-05-20 14:04:00 B 0 0
            #14 2018-05-20 14:05:00 B 1 2


            explanation



            When we call



            r <- rle(df$is.5)


            the result we get is



            r
            #Run Length Encoding
            # lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
            # values : num [1:10] 0 1 0 1 0 1 0 1 0 1


            We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.



            We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.



            r$values <- cumsum(r$values) * r$values
            r$values
            # [1] 0 1 0 2 0 3 0 4 0 5


            Finally we call inverse.rle to get back a vector of the same length as is.5.



            inverse.rle(r)
            # [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5


            We do this for every group.






            share|improve this answer




















            • 1





              I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.

              – M-M
              5 hours ago






            • 1





              @M-M Added some explanation. Thanks for the comment.

              – markus
              5 hours ago













            5












            5








            5







            One option using rle



            library(dplyr)
            df %>%
            group_by(group) %>%
            mutate(
            spell =
            r <- rle(is.5)
            r$values <- cumsum(r$values) * r$values
            inverse.rle(r)

            )
            # A tibble: 14 x 4
            # Groups: group [2]
            # time group is.5 spell
            # <dttm> <chr> <dbl> <dbl>
            # 1 2018-10-07 01:39:00 A 0 0
            # 2 2018-10-07 01:40:00 A 1 1
            # 3 2018-10-07 01:41:00 A 1 1
            # 4 2018-10-07 01:42:00 A 0 0
            # 5 2018-10-07 01:43:00 A 1 2
            # 6 2018-10-07 01:44:00 A 0 0
            # 7 2018-10-07 01:45:00 A 0 0
            # 8 2018-10-07 01:46:00 A 1 3
            # 9 2018-05-20 14:00:00 B 0 0
            #10 2018-05-20 14:01:00 B 0 0
            #11 2018-05-20 14:02:00 B 1 1
            #12 2018-05-20 14:03:00 B 1 1
            #13 2018-05-20 14:04:00 B 0 0
            #14 2018-05-20 14:05:00 B 1 2


            explanation



            When we call



            r <- rle(df$is.5)


            the result we get is



            r
            #Run Length Encoding
            # lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
            # values : num [1:10] 0 1 0 1 0 1 0 1 0 1


            We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.



            We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.



            r$values <- cumsum(r$values) * r$values
            r$values
            # [1] 0 1 0 2 0 3 0 4 0 5


            Finally we call inverse.rle to get back a vector of the same length as is.5.



            inverse.rle(r)
            # [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5


            We do this for every group.






            share|improve this answer















            One option using rle



            library(dplyr)
            df %>%
            group_by(group) %>%
            mutate(
            spell =
            r <- rle(is.5)
            r$values <- cumsum(r$values) * r$values
            inverse.rle(r)

            )
            # A tibble: 14 x 4
            # Groups: group [2]
            # time group is.5 spell
            # <dttm> <chr> <dbl> <dbl>
            # 1 2018-10-07 01:39:00 A 0 0
            # 2 2018-10-07 01:40:00 A 1 1
            # 3 2018-10-07 01:41:00 A 1 1
            # 4 2018-10-07 01:42:00 A 0 0
            # 5 2018-10-07 01:43:00 A 1 2
            # 6 2018-10-07 01:44:00 A 0 0
            # 7 2018-10-07 01:45:00 A 0 0
            # 8 2018-10-07 01:46:00 A 1 3
            # 9 2018-05-20 14:00:00 B 0 0
            #10 2018-05-20 14:01:00 B 0 0
            #11 2018-05-20 14:02:00 B 1 1
            #12 2018-05-20 14:03:00 B 1 1
            #13 2018-05-20 14:04:00 B 0 0
            #14 2018-05-20 14:05:00 B 1 2


            explanation



            When we call



            r <- rle(df$is.5)


            the result we get is



            r
            #Run Length Encoding
            # lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
            # values : num [1:10] 0 1 0 1 0 1 0 1 0 1


            We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.



            We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.



            r$values <- cumsum(r$values) * r$values
            r$values
            # [1] 0 1 0 2 0 3 0 4 0 5


            Finally we call inverse.rle to get back a vector of the same length as is.5.



            inverse.rle(r)
            # [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5


            We do this for every group.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 5 hours ago

























            answered 7 hours ago









            markusmarkus

            15k11336




            15k11336







            • 1





              I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.

              – M-M
              5 hours ago






            • 1





              @M-M Added some explanation. Thanks for the comment.

              – markus
              5 hours ago












            • 1





              I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.

              – M-M
              5 hours ago






            • 1





              @M-M Added some explanation. Thanks for the comment.

              – markus
              5 hours ago







            1




            1





            I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.

            – M-M
            5 hours ago





            I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.

            – M-M
            5 hours ago




            1




            1





            @M-M Added some explanation. Thanks for the comment.

            – markus
            5 hours ago





            @M-M Added some explanation. Thanks for the comment.

            – markus
            5 hours ago













            4














            Here's a helper function that can return what you are after



            spell_index <- function(time, flag) 
            change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
            cumsum(change) * (flag==1)+0



            And you can use it with your data like



            library(dplyr)
            df %>%
            group_by(group) %>%
            mutate(
            spell = spell_index(time, is.5)
            )


            Basically the helper functions uses lag() to look for changes. We use cumsum() to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.






            share|improve this answer



























              4














              Here's a helper function that can return what you are after



              spell_index <- function(time, flag) 
              change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
              cumsum(change) * (flag==1)+0



              And you can use it with your data like



              library(dplyr)
              df %>%
              group_by(group) %>%
              mutate(
              spell = spell_index(time, is.5)
              )


              Basically the helper functions uses lag() to look for changes. We use cumsum() to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.






              share|improve this answer

























                4












                4








                4







                Here's a helper function that can return what you are after



                spell_index <- function(time, flag) 
                change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
                cumsum(change) * (flag==1)+0



                And you can use it with your data like



                library(dplyr)
                df %>%
                group_by(group) %>%
                mutate(
                spell = spell_index(time, is.5)
                )


                Basically the helper functions uses lag() to look for changes. We use cumsum() to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.






                share|improve this answer













                Here's a helper function that can return what you are after



                spell_index <- function(time, flag) 
                change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
                cumsum(change) * (flag==1)+0



                And you can use it with your data like



                library(dplyr)
                df %>%
                group_by(group) %>%
                mutate(
                spell = spell_index(time, is.5)
                )


                Basically the helper functions uses lag() to look for changes. We use cumsum() to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 7 hours ago









                MrFlickMrFlick

                124k11141173




                124k11141173





















                    1














                    This works,



                    The data,



                    df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))


                    We split our data by group,



                    df2 <- split(df, df$group)


                    Build a function we can apply to the list,



                    my_func <- function(dat) change == -1, 0, cumsum(flag))) %>% 
                    dplyr::select(time, group, is.5, spell)
                    return(rst)



                    Then apply it,



                    l <- lapply(df2, my_func)


                    We can now turn this list back into a data frame:



                    do.call(rbind.data.frame, l)





                    share|improve this answer





























                      1














                      This works,



                      The data,



                      df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))


                      We split our data by group,



                      df2 <- split(df, df$group)


                      Build a function we can apply to the list,



                      my_func <- function(dat) change == -1, 0, cumsum(flag))) %>% 
                      dplyr::select(time, group, is.5, spell)
                      return(rst)



                      Then apply it,



                      l <- lapply(df2, my_func)


                      We can now turn this list back into a data frame:



                      do.call(rbind.data.frame, l)





                      share|improve this answer



























                        1












                        1








                        1







                        This works,



                        The data,



                        df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))


                        We split our data by group,



                        df2 <- split(df, df$group)


                        Build a function we can apply to the list,



                        my_func <- function(dat) change == -1, 0, cumsum(flag))) %>% 
                        dplyr::select(time, group, is.5, spell)
                        return(rst)



                        Then apply it,



                        l <- lapply(df2, my_func)


                        We can now turn this list back into a data frame:



                        do.call(rbind.data.frame, l)





                        share|improve this answer















                        This works,



                        The data,



                        df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))


                        We split our data by group,



                        df2 <- split(df, df$group)


                        Build a function we can apply to the list,



                        my_func <- function(dat) change == -1, 0, cumsum(flag))) %>% 
                        dplyr::select(time, group, is.5, spell)
                        return(rst)



                        Then apply it,



                        l <- lapply(df2, my_func)


                        We can now turn this list back into a data frame:



                        do.call(rbind.data.frame, l)






                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited 7 hours ago

























                        answered 7 hours ago









                        Hector HaffendenHector Haffenden

                        579216




                        579216





















                            1














                            A somehow different possibility could be:



                            df %>%
                            group_by(group) %>%
                            mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
                            group_by(group, is.5) %>%
                            mutate(spell = dense_rank(spell)) %>%
                            ungroup() %>%
                            mutate(spell = ifelse(is.5 == 0, 0, spell))

                            time group is.5 spell
                            <dttm> <chr> <dbl> <dbl>
                            1 2018-10-07 01:39:00 A 0 0
                            2 2018-10-07 01:40:00 A 1 1
                            3 2018-10-07 01:41:00 A 1 1
                            4 2018-10-07 01:42:00 A 0 0
                            5 2018-10-07 01:43:00 A 1 2
                            6 2018-10-07 01:44:00 A 0 0
                            7 2018-10-07 01:45:00 A 0 0
                            8 2018-10-07 01:46:00 A 1 3
                            9 2018-05-20 14:00:00 B 0 0
                            10 2018-05-20 14:01:00 B 0 0
                            11 2018-05-20 14:02:00 B 1 1
                            12 2018-05-20 14:03:00 B 1 1
                            13 2018-05-20 14:04:00 B 0 0
                            14 2018-05-20 14:05:00 B 1 2





                            share|improve this answer



























                              1














                              A somehow different possibility could be:



                              df %>%
                              group_by(group) %>%
                              mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
                              group_by(group, is.5) %>%
                              mutate(spell = dense_rank(spell)) %>%
                              ungroup() %>%
                              mutate(spell = ifelse(is.5 == 0, 0, spell))

                              time group is.5 spell
                              <dttm> <chr> <dbl> <dbl>
                              1 2018-10-07 01:39:00 A 0 0
                              2 2018-10-07 01:40:00 A 1 1
                              3 2018-10-07 01:41:00 A 1 1
                              4 2018-10-07 01:42:00 A 0 0
                              5 2018-10-07 01:43:00 A 1 2
                              6 2018-10-07 01:44:00 A 0 0
                              7 2018-10-07 01:45:00 A 0 0
                              8 2018-10-07 01:46:00 A 1 3
                              9 2018-05-20 14:00:00 B 0 0
                              10 2018-05-20 14:01:00 B 0 0
                              11 2018-05-20 14:02:00 B 1 1
                              12 2018-05-20 14:03:00 B 1 1
                              13 2018-05-20 14:04:00 B 0 0
                              14 2018-05-20 14:05:00 B 1 2





                              share|improve this answer

























                                1












                                1








                                1







                                A somehow different possibility could be:



                                df %>%
                                group_by(group) %>%
                                mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
                                group_by(group, is.5) %>%
                                mutate(spell = dense_rank(spell)) %>%
                                ungroup() %>%
                                mutate(spell = ifelse(is.5 == 0, 0, spell))

                                time group is.5 spell
                                <dttm> <chr> <dbl> <dbl>
                                1 2018-10-07 01:39:00 A 0 0
                                2 2018-10-07 01:40:00 A 1 1
                                3 2018-10-07 01:41:00 A 1 1
                                4 2018-10-07 01:42:00 A 0 0
                                5 2018-10-07 01:43:00 A 1 2
                                6 2018-10-07 01:44:00 A 0 0
                                7 2018-10-07 01:45:00 A 0 0
                                8 2018-10-07 01:46:00 A 1 3
                                9 2018-05-20 14:00:00 B 0 0
                                10 2018-05-20 14:01:00 B 0 0
                                11 2018-05-20 14:02:00 B 1 1
                                12 2018-05-20 14:03:00 B 1 1
                                13 2018-05-20 14:04:00 B 0 0
                                14 2018-05-20 14:05:00 B 1 2





                                share|improve this answer













                                A somehow different possibility could be:



                                df %>%
                                group_by(group) %>%
                                mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
                                group_by(group, is.5) %>%
                                mutate(spell = dense_rank(spell)) %>%
                                ungroup() %>%
                                mutate(spell = ifelse(is.5 == 0, 0, spell))

                                time group is.5 spell
                                <dttm> <chr> <dbl> <dbl>
                                1 2018-10-07 01:39:00 A 0 0
                                2 2018-10-07 01:40:00 A 1 1
                                3 2018-10-07 01:41:00 A 1 1
                                4 2018-10-07 01:42:00 A 0 0
                                5 2018-10-07 01:43:00 A 1 2
                                6 2018-10-07 01:44:00 A 0 0
                                7 2018-10-07 01:45:00 A 0 0
                                8 2018-10-07 01:46:00 A 1 3
                                9 2018-05-20 14:00:00 B 0 0
                                10 2018-05-20 14:01:00 B 0 0
                                11 2018-05-20 14:02:00 B 1 1
                                12 2018-05-20 14:03:00 B 1 1
                                13 2018-05-20 14:04:00 B 0 0
                                14 2018-05-20 14:05:00 B 1 2






                                share|improve this answer












                                share|improve this answer



                                share|improve this answer










                                answered 6 hours ago









                                tmfmnktmfmnk

                                3,6211516




                                3,6211516





















                                    1














                                    One options is using cumsum:



                                    library(dplyr)
                                    df %>% group_by(group) %>% arrange(group, time) %>%
                                    mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )


                                    # # A tibble: 14 x 4
                                    # # Groups: group [2]
                                    # time group is.5 spell
                                    # <dttm> <chr> <dbl> <dbl>
                                    # 1 2018-10-07 01:39:00 A 0 0
                                    # 2 2018-10-07 01:40:00 A 1 1
                                    # 3 2018-10-07 01:41:00 A 1 1
                                    # 4 2018-10-07 01:42:00 A 0 0
                                    # 5 2018-10-07 01:43:00 A 1 2
                                    # 6 2018-10-07 01:44:00 A 0 0
                                    # 7 2018-10-07 01:45:00 A 0 0
                                    # 8 2018-10-07 01:46:00 A 1 3
                                    # 9 2018-05-20 14:00:00 B 0 0
                                    # 10 2018-05-20 14:01:00 B 0 0
                                    # 11 2018-05-20 14:02:00 B 1 1
                                    # 12 2018-05-20 14:03:00 B 1 1
                                    # 13 2018-05-20 14:04:00 B 0 0
                                    # 14 2018-05-20 14:05:00 B 1 2


                                    c(0,lag(is.5)[-1]) != is.5 this takes care of assigning a new id (i.e. spell) whenever is.5 changes; but we want to avoid assigning new ones to those rows is.5 equal to 0 and that's why I have the second rule in cumsum function (i.e. (is.5!=0)).



                                    However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0. That's why I have multiplied the answer by is.5.






                                    share|improve this answer



























                                      1














                                      One options is using cumsum:



                                      library(dplyr)
                                      df %>% group_by(group) %>% arrange(group, time) %>%
                                      mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )


                                      # # A tibble: 14 x 4
                                      # # Groups: group [2]
                                      # time group is.5 spell
                                      # <dttm> <chr> <dbl> <dbl>
                                      # 1 2018-10-07 01:39:00 A 0 0
                                      # 2 2018-10-07 01:40:00 A 1 1
                                      # 3 2018-10-07 01:41:00 A 1 1
                                      # 4 2018-10-07 01:42:00 A 0 0
                                      # 5 2018-10-07 01:43:00 A 1 2
                                      # 6 2018-10-07 01:44:00 A 0 0
                                      # 7 2018-10-07 01:45:00 A 0 0
                                      # 8 2018-10-07 01:46:00 A 1 3
                                      # 9 2018-05-20 14:00:00 B 0 0
                                      # 10 2018-05-20 14:01:00 B 0 0
                                      # 11 2018-05-20 14:02:00 B 1 1
                                      # 12 2018-05-20 14:03:00 B 1 1
                                      # 13 2018-05-20 14:04:00 B 0 0
                                      # 14 2018-05-20 14:05:00 B 1 2


                                      c(0,lag(is.5)[-1]) != is.5 this takes care of assigning a new id (i.e. spell) whenever is.5 changes; but we want to avoid assigning new ones to those rows is.5 equal to 0 and that's why I have the second rule in cumsum function (i.e. (is.5!=0)).



                                      However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0. That's why I have multiplied the answer by is.5.






                                      share|improve this answer

























                                        1












                                        1








                                        1







                                        One options is using cumsum:



                                        library(dplyr)
                                        df %>% group_by(group) %>% arrange(group, time) %>%
                                        mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )


                                        # # A tibble: 14 x 4
                                        # # Groups: group [2]
                                        # time group is.5 spell
                                        # <dttm> <chr> <dbl> <dbl>
                                        # 1 2018-10-07 01:39:00 A 0 0
                                        # 2 2018-10-07 01:40:00 A 1 1
                                        # 3 2018-10-07 01:41:00 A 1 1
                                        # 4 2018-10-07 01:42:00 A 0 0
                                        # 5 2018-10-07 01:43:00 A 1 2
                                        # 6 2018-10-07 01:44:00 A 0 0
                                        # 7 2018-10-07 01:45:00 A 0 0
                                        # 8 2018-10-07 01:46:00 A 1 3
                                        # 9 2018-05-20 14:00:00 B 0 0
                                        # 10 2018-05-20 14:01:00 B 0 0
                                        # 11 2018-05-20 14:02:00 B 1 1
                                        # 12 2018-05-20 14:03:00 B 1 1
                                        # 13 2018-05-20 14:04:00 B 0 0
                                        # 14 2018-05-20 14:05:00 B 1 2


                                        c(0,lag(is.5)[-1]) != is.5 this takes care of assigning a new id (i.e. spell) whenever is.5 changes; but we want to avoid assigning new ones to those rows is.5 equal to 0 and that's why I have the second rule in cumsum function (i.e. (is.5!=0)).



                                        However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0. That's why I have multiplied the answer by is.5.






                                        share|improve this answer













                                        One options is using cumsum:



                                        library(dplyr)
                                        df %>% group_by(group) %>% arrange(group, time) %>%
                                        mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )


                                        # # A tibble: 14 x 4
                                        # # Groups: group [2]
                                        # time group is.5 spell
                                        # <dttm> <chr> <dbl> <dbl>
                                        # 1 2018-10-07 01:39:00 A 0 0
                                        # 2 2018-10-07 01:40:00 A 1 1
                                        # 3 2018-10-07 01:41:00 A 1 1
                                        # 4 2018-10-07 01:42:00 A 0 0
                                        # 5 2018-10-07 01:43:00 A 1 2
                                        # 6 2018-10-07 01:44:00 A 0 0
                                        # 7 2018-10-07 01:45:00 A 0 0
                                        # 8 2018-10-07 01:46:00 A 1 3
                                        # 9 2018-05-20 14:00:00 B 0 0
                                        # 10 2018-05-20 14:01:00 B 0 0
                                        # 11 2018-05-20 14:02:00 B 1 1
                                        # 12 2018-05-20 14:03:00 B 1 1
                                        # 13 2018-05-20 14:04:00 B 0 0
                                        # 14 2018-05-20 14:05:00 B 1 2


                                        c(0,lag(is.5)[-1]) != is.5 this takes care of assigning a new id (i.e. spell) whenever is.5 changes; but we want to avoid assigning new ones to those rows is.5 equal to 0 and that's why I have the second rule in cumsum function (i.e. (is.5!=0)).



                                        However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0. That's why I have multiplied the answer by is.5.







                                        share|improve this answer












                                        share|improve this answer



                                        share|improve this answer










                                        answered 5 hours ago









                                        M-MM-M

                                        7,17962146




                                        7,17962146





















                                            1














                                            Here is one option with rleid from data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'group', get the run-length-id (rleid) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i with a logical vector to select rows that have 'spell' values not zero, match those values of 'spell' with unique 'spell' and assign it to 'spell'



                                            library(data.table)
                                            setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
                                            ][!!spell, spell := match(spell, unique(spell))][]
                                            # time group is.5 spell
                                            # 1: 2018-10-07 01:39:00 A 0 0
                                            # 2: 2018-10-07 01:40:00 A 1 1
                                            # 3: 2018-10-07 01:41:00 A 1 1
                                            # 4: 2018-10-07 01:42:00 A 0 0
                                            # 5: 2018-10-07 01:43:00 A 1 2
                                            # 6: 2018-10-07 01:44:00 A 0 0
                                            # 7: 2018-10-07 01:45:00 A 0 0
                                            # 8: 2018-10-07 01:46:00 A 1 3
                                            # 9: 2018-05-20 14:00:00 B 0 0
                                            #10: 2018-05-20 14:01:00 B 0 0
                                            #11: 2018-05-20 14:02:00 B 1 1
                                            #12: 2018-05-20 14:03:00 B 1 1
                                            #13: 2018-05-20 14:04:00 B 0 0
                                            #14: 2018-05-20 14:05:00 B 1 2



                                            Or after the first step, use .GRP



                                            df[!!spell, spell := .GRP, spell]





                                            share|improve this answer





























                                              1














                                              Here is one option with rleid from data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'group', get the run-length-id (rleid) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i with a logical vector to select rows that have 'spell' values not zero, match those values of 'spell' with unique 'spell' and assign it to 'spell'



                                              library(data.table)
                                              setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
                                              ][!!spell, spell := match(spell, unique(spell))][]
                                              # time group is.5 spell
                                              # 1: 2018-10-07 01:39:00 A 0 0
                                              # 2: 2018-10-07 01:40:00 A 1 1
                                              # 3: 2018-10-07 01:41:00 A 1 1
                                              # 4: 2018-10-07 01:42:00 A 0 0
                                              # 5: 2018-10-07 01:43:00 A 1 2
                                              # 6: 2018-10-07 01:44:00 A 0 0
                                              # 7: 2018-10-07 01:45:00 A 0 0
                                              # 8: 2018-10-07 01:46:00 A 1 3
                                              # 9: 2018-05-20 14:00:00 B 0 0
                                              #10: 2018-05-20 14:01:00 B 0 0
                                              #11: 2018-05-20 14:02:00 B 1 1
                                              #12: 2018-05-20 14:03:00 B 1 1
                                              #13: 2018-05-20 14:04:00 B 0 0
                                              #14: 2018-05-20 14:05:00 B 1 2



                                              Or after the first step, use .GRP



                                              df[!!spell, spell := .GRP, spell]





                                              share|improve this answer



























                                                1












                                                1








                                                1







                                                Here is one option with rleid from data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'group', get the run-length-id (rleid) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i with a logical vector to select rows that have 'spell' values not zero, match those values of 'spell' with unique 'spell' and assign it to 'spell'



                                                library(data.table)
                                                setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
                                                ][!!spell, spell := match(spell, unique(spell))][]
                                                # time group is.5 spell
                                                # 1: 2018-10-07 01:39:00 A 0 0
                                                # 2: 2018-10-07 01:40:00 A 1 1
                                                # 3: 2018-10-07 01:41:00 A 1 1
                                                # 4: 2018-10-07 01:42:00 A 0 0
                                                # 5: 2018-10-07 01:43:00 A 1 2
                                                # 6: 2018-10-07 01:44:00 A 0 0
                                                # 7: 2018-10-07 01:45:00 A 0 0
                                                # 8: 2018-10-07 01:46:00 A 1 3
                                                # 9: 2018-05-20 14:00:00 B 0 0
                                                #10: 2018-05-20 14:01:00 B 0 0
                                                #11: 2018-05-20 14:02:00 B 1 1
                                                #12: 2018-05-20 14:03:00 B 1 1
                                                #13: 2018-05-20 14:04:00 B 0 0
                                                #14: 2018-05-20 14:05:00 B 1 2



                                                Or after the first step, use .GRP



                                                df[!!spell, spell := .GRP, spell]





                                                share|improve this answer















                                                Here is one option with rleid from data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'group', get the run-length-id (rleid) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i with a logical vector to select rows that have 'spell' values not zero, match those values of 'spell' with unique 'spell' and assign it to 'spell'



                                                library(data.table)
                                                setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
                                                ][!!spell, spell := match(spell, unique(spell))][]
                                                # time group is.5 spell
                                                # 1: 2018-10-07 01:39:00 A 0 0
                                                # 2: 2018-10-07 01:40:00 A 1 1
                                                # 3: 2018-10-07 01:41:00 A 1 1
                                                # 4: 2018-10-07 01:42:00 A 0 0
                                                # 5: 2018-10-07 01:43:00 A 1 2
                                                # 6: 2018-10-07 01:44:00 A 0 0
                                                # 7: 2018-10-07 01:45:00 A 0 0
                                                # 8: 2018-10-07 01:46:00 A 1 3
                                                # 9: 2018-05-20 14:00:00 B 0 0
                                                #10: 2018-05-20 14:01:00 B 0 0
                                                #11: 2018-05-20 14:02:00 B 1 1
                                                #12: 2018-05-20 14:03:00 B 1 1
                                                #13: 2018-05-20 14:04:00 B 0 0
                                                #14: 2018-05-20 14:05:00 B 1 2



                                                Or after the first step, use .GRP



                                                df[!!spell, spell := .GRP, spell]






                                                share|improve this answer














                                                share|improve this answer



                                                share|improve this answer








                                                edited 1 hour ago

























                                                answered 1 hour ago









                                                akrunakrun

                                                418k13207282




                                                418k13207282



























                                                    draft saved

                                                    draft discarded
















































                                                    Thanks for contributing an answer to Stack Overflow!


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid


                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.

                                                    To learn more, see our tips on writing great answers.




                                                    draft saved


                                                    draft discarded














                                                    StackExchange.ready(
                                                    function ()
                                                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55463310%2fidentify-and-count-spells-distinctive-events-within-each-group%23new-answer', 'question_page');

                                                    );

                                                    Post as a guest















                                                    Required, but never shown





















































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown

































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown







                                                    Popular posts from this blog

                                                    Magento 2 duplicate PHPSESSID cookie when using session_start() in custom php scriptMagento 2: User cant logged in into to account page, no error showing!Magento duplicate on subdomainGrabbing storeview from cookie (after using language selector)How do I run php custom script on magento2Magento 2: Include PHP script in headerSession lock after using Cm_RedisSessionscript php to update stockMagento set cookie popupMagento 2 session id cookie - where to find it?How to import Configurable product from csv with custom attributes using php scriptMagento 2 run custom PHP script

                                                    Can not update quote_id field of “quote_item” table magento 2Magento 2.1 - We can't remove the item. (Shopping Cart doesnt allow us to remove items before becomes empty)Add value for custom quote item attribute using REST apiREST API endpoint v1/carts/cartId/items always returns error messageCorrect way to save entries to databaseHow to remove all associated quote objects of a customer completelyMagento 2 - Save value from custom input field to quote_itemGet quote_item data using quote id and product id filter in Magento 2How to set additional data to quote_item table from controller in Magento 2?What is the purpose of additional_data column in quote_item table in magento2Set Custom Price to Quote item magento2 from controller

                                                    How to solve knockout JS error in Magento 2 Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?(Magento2) knockout.js:3012 Uncaught ReferenceError: Unable to process bindingUnable to process binding Knockout.js magento 2Cannot read property `scopeLabel` of undefined on Product Detail PageCan't get Customer Data on frontend in Magento 2Magento2 Order Summary - unable to process bindingKO templates are not loading in Magento 2.1 applicationgetting knockout js error magento 2Product grid not load -— Unable to process binding Knockout.js magento 2Product form not loaded in magento2Uncaught ReferenceError: Unable to process binding “if: function()return (isShowLegend()) ” magento 2